skip to main content
10.1145/264107.264215acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

DataScalar architectures

Published: 01 May 1997 Publication History

Abstract

DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In this execution model, each processor broadcasts operands it loads from its local memory to all other units. In this paper, we describe the benefits, costs, and problems associated with the DataScalar model. We also present simulation results of one possible implementation of a DataScalar system. In our simulated implementation, six unmodified SPEC95 binaries ran from 7% slower to 50% faster on two nodes, and from 9% to 100% faster on four nodes, than on a system with a comparable, more traditional memory system. Our intuition and results show that DataScalar architectures work best with codes for which traditional parallelization techniques fail. We conclude with a discussion of how DataScalar systems may accommodate traditional parallel processing, thus improving performance over a much wider range applications than is currently possible with either model.

References

[1]
Doug Burger, Todd M. Austin, and Stevcn Bennett. Evaluating Future Microprocessors: the Simple.Scalar Tool Set. Technical Report 1308, Computer Sciences Department, University of Wisconsin, Madison, WI, July 1996.
[2]
Doug Burger and James R. Goodman. Exploiting Optical Interconnects to Eliminate Serial Bottlenecks. In Proceedings of the Third International Conference on Massively Parallel Processing Using Optical Interconnects, October 1996.
[3]
Doug Burger, James R. Goodman, and Alain I~gi. The Declining Effectiveness of Dynamic Caching for General-Purpose Microprocessors. Technical Report 1261, Computer Sciences Department, University of Wisconsin, Madison, WI, January 1995.
[4]
Doug Burger, James R. Goodman, and Alain Irdigi. Memory Bandwidth Limitations of Future Microprocessors. in Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 79-90, May 1996.
[5]
F. Darema-Rogers, V. A. Norton, and G.F. Pfister. Using a Single- Program Multiple-Data Computation Model for Parallel Execution of Scientific Applieations. IBM Research Report RC 11552, November 1985.
[6]
Don Draper, left Yetter, Ahsan Bootehsaz, Michael A. Buekley, Charlie X. Huang, Yusuke Ohtomo, Jurij Paraszezak, and Donald A. Priore. Panel Discussion on the Interconnect Nightmare. In Proceedings of the 1996 International Solid-State Circuits Conference, pages 278-279, February 1996.
[7]
J.H. Yoo et al. A 32-bank 1Gb DRAM with I GB/s Bandwidth. in Proceedings of the 1996 International Solid-State Circuits Conference, pages 378-379. Samsung Electronics Co., February 1996.
[8]
Masashi Horiguchi et al. An Experirhenta1220MHz 1Gb DRAM. In Proceedings of the 1995 International Solid-State Circuits Conference, pages 252-253. Hitachi, February 1995.
[9]
Toru Shimizu et al, A Multimedia 32b RISC Microprocessor with 16Mb DRAM. In Proceedings of the 1996 International Solid-State Circuits Conference, pages 216-217. Mitsubishi Electric Co., February 1996.
[10]
Marco F'dlo, Stephen W. Keekler, W'flliamJ. Daily, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, and Whay S. Lee. The M-Machine Multieomputer. In Proceedings of the 28th International Symposium on Microarchitecture, pages 146-156, November 1996.
[11]
Michael J. Flynn. Some Computer Organizations and Their Effectiveness. IEEE Transactions on Computers, C-21:948-960, 1972.
[12]
Manoj Franklin. The Multiscalar Architecture. Ph.D. thesis, University of Wisconsin, Madison, WI, December 1993.
[13]
Hector Gareia-Molina, Richard J. Lipton, and Jaeobo Valdes. A Massive Memory Machine. IEEE Transactions on Computers, C- 33(5):391-399, May 1984.
[14]
Mark D. Hill. Aspects of Cache Memory and Instruction Buffer Performance. Ph.D. thesis, University of California at Berkeley, November 1987.
[15]
Liviu Iftode, Kai Li, and Karin Petersen. Memory Servers for Multicomputers. In Proceedings of the 38thlEEE Computer Society International Conference (COMPCON), pages 538-547, February 1993.
[16]
David V. James, Anthony T. Laundrie, Stein Gjessing, and GudndarS. Sohi. Scalable Coherent Interface. IEEE Computer, 23(6):74---77, June 1990.
[17]
Osamu Kimura, Richard Crisp, Michael Nagy, Henry Lie, Roelof Salters, Kenji Numata, Takao Watanabe, and Kazunori Saitoh. Panel Session: DRAM + Logic Integration: Which Architecture and Fabrieation Process. In Proceedings of the 1997 International Solid-State Circuits Conference, February 1997.
[18]
Kazuaki Mumkami, Satoru Shirakawa, and Hiroshi Miyajima. Parallel Processing RAM Chip with 256Mb DRAM and Quad Processors. In Proceedings of the 1997 International Solid-State Circuits Conference, pages 228-229, February 1997.
[19]
Basm A. Nayfeh, Lance Hammond, and Kunle Olukotun. Evaluation of Design Alternatives for a Multiprocessor Microprocessor. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
[20]
David Patterson, Tom Anderson, and Kathy Yeliek, The Case for IRAM. In Proceedings of HOT Chips 8, Stanford, CA, August 1996.
[21]
Ashley Saulsbury, Fong Pong, and Andreas Nowatzyk. Missing the Memory Wall: The Case for Processor/Memory Integration. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
[22]
Steven L. Scott, James R. Goodman, and Mary K. Vernon. Performance of the SCI Ring. In Proceedings of the }gth Annual International Symposium on Computer Architecture, pages 403--414, May 1992.
[23]
IEEE Computer Society. Sealable Coherent Interface (SCI). ANSI/ IEEE Std 1596-1992, August 1993.
[24]
Gurindar S. Sohi. Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers. IEEE Transactions on Computers, 39(3):349-359, March 1990.
[25]
Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. Multisealar Processors. In Proceedings of the 22nd Annual international Symposium on Computer Architecture, pages 414--425, June 1995.
[26]
Standard Performance Evaluation Corporation. SPEC Newsletter, Fairfax, VA, September 1995.
[27]
Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392--403, June 1995.
[28]
David A. Wood and Mark D. Hill. Cost-Effective Parallel Computing. IEEE Computer, 28(2):69-72, February 1995.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture
June 1997
350 pages
ISBN:0897919017
DOI:10.1145/264107
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 25, Issue 2
    Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)
    May 1997
    349 pages
    ISSN:0163-5964
    DOI:10.1145/384286
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA97
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)129
  • Downloads (Last 6 weeks)28
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media