Article

Free access

DataScalar architectures

Authors:

Stefanos Kaxiras,

James R. GoodmanAuthors Info & Claims

ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture

Pages 338 - 349

https://doi.org/10.1145/264107.264215

Published: 01 May 1997 Publication History

Abstract

DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In this execution model, each processor broadcasts operands it loads from its local memory to all other units. In this paper, we describe the benefits, costs, and problems associated with the DataScalar model. We also present simulation results of one possible implementation of a DataScalar system. In our simulated implementation, six unmodified SPEC95 binaries ran from 7% slower to 50% faster on two nodes, and from 9% to 100% faster on four nodes, than on a system with a comparable, more traditional memory system. Our intuition and results show that DataScalar architectures work best with codes for which traditional parallelization techniques fail. We conclude with a discussion of how DataScalar systems may accommodate traditional parallel processing, thus improving performance over a much wider range applications than is currently possible with either model.

References

[1]

Doug Burger, Todd M. Austin, and Stevcn Bennett. Evaluating Future Microprocessors: the Simple.Scalar Tool Set. Technical Report 1308, Computer Sciences Department, University of Wisconsin, Madison, WI, July 1996.

[2]

Doug Burger and James R. Goodman. Exploiting Optical Interconnects to Eliminate Serial Bottlenecks. In Proceedings of the Third International Conference on Massively Parallel Processing Using Optical Interconnects, October 1996.

Digital Library

[3]

Doug Burger, James R. Goodman, and Alain I~gi. The Declining Effectiveness of Dynamic Caching for General-Purpose Microprocessors. Technical Report 1261, Computer Sciences Department, University of Wisconsin, Madison, WI, January 1995.

[4]

Doug Burger, James R. Goodman, and Alain Irdigi. Memory Bandwidth Limitations of Future Microprocessors. in Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 79-90, May 1996.

Digital Library

[5]

F. Darema-Rogers, V. A. Norton, and G.F. Pfister. Using a Single- Program Multiple-Data Computation Model for Parallel Execution of Scientific Applieations. IBM Research Report RC 11552, November 1985.

[6]

Don Draper, left Yetter, Ahsan Bootehsaz, Michael A. Buekley, Charlie X. Huang, Yusuke Ohtomo, Jurij Paraszezak, and Donald A. Priore. Panel Discussion on the Interconnect Nightmare. In Proceedings of the 1996 International Solid-State Circuits Conference, pages 278-279, February 1996.

[7]

J.H. Yoo et al. A 32-bank 1Gb DRAM with I GB/s Bandwidth. in Proceedings of the 1996 International Solid-State Circuits Conference, pages 378-379. Samsung Electronics Co., February 1996.

[8]

Masashi Horiguchi et al. An Experirhenta1220MHz 1Gb DRAM. In Proceedings of the 1995 International Solid-State Circuits Conference, pages 252-253. Hitachi, February 1995.

[9]

Toru Shimizu et al, A Multimedia 32b RISC Microprocessor with 16Mb DRAM. In Proceedings of the 1996 International Solid-State Circuits Conference, pages 216-217. Mitsubishi Electric Co., February 1996.

[10]

Marco F'dlo, Stephen W. Keekler, W'flliamJ. Daily, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, and Whay S. Lee. The M-Machine Multieomputer. In Proceedings of the 28th International Symposium on Microarchitecture, pages 146-156, November 1996.

Digital Library

[11]

Michael J. Flynn. Some Computer Organizations and Their Effectiveness. IEEE Transactions on Computers, C-21:948-960, 1972.

Digital Library

[12]

Manoj Franklin. The Multiscalar Architecture. Ph.D. thesis, University of Wisconsin, Madison, WI, December 1993.

Digital Library

[13]

Hector Gareia-Molina, Richard J. Lipton, and Jaeobo Valdes. A Massive Memory Machine. IEEE Transactions on Computers, C- 33(5):391-399, May 1984.

Digital Library

[14]

Mark D. Hill. Aspects of Cache Memory and Instruction Buffer Performance. Ph.D. thesis, University of California at Berkeley, November 1987.

Digital Library

[15]

Liviu Iftode, Kai Li, and Karin Petersen. Memory Servers for Multicomputers. In Proceedings of the 38thlEEE Computer Society International Conference (COMPCON), pages 538-547, February 1993.

[16]

David V. James, Anthony T. Laundrie, Stein Gjessing, and GudndarS. Sohi. Scalable Coherent Interface. IEEE Computer, 23(6):74---77, June 1990.

Digital Library

[17]

Osamu Kimura, Richard Crisp, Michael Nagy, Henry Lie, Roelof Salters, Kenji Numata, Takao Watanabe, and Kazunori Saitoh. Panel Session: DRAM + Logic Integration: Which Architecture and Fabrieation Process. In Proceedings of the 1997 International Solid-State Circuits Conference, February 1997.

[18]

Kazuaki Mumkami, Satoru Shirakawa, and Hiroshi Miyajima. Parallel Processing RAM Chip with 256Mb DRAM and Quad Processors. In Proceedings of the 1997 International Solid-State Circuits Conference, pages 228-229, February 1997.

[19]

Basm A. Nayfeh, Lance Hammond, and Kunle Olukotun. Evaluation of Design Alternatives for a Multiprocessor Microprocessor. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.

Digital Library

[20]

David Patterson, Tom Anderson, and Kathy Yeliek, The Case for IRAM. In Proceedings of HOT Chips 8, Stanford, CA, August 1996.

[21]

Ashley Saulsbury, Fong Pong, and Andreas Nowatzyk. Missing the Memory Wall: The Case for Processor/Memory Integration. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.

Digital Library

[22]

Steven L. Scott, James R. Goodman, and Mary K. Vernon. Performance of the SCI Ring. In Proceedings of the }gth Annual International Symposium on Computer Architecture, pages 403--414, May 1992.

Digital Library

[23]

IEEE Computer Society. Sealable Coherent Interface (SCI). ANSI/ IEEE Std 1596-1992, August 1993.

[24]

Gurindar S. Sohi. Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers. IEEE Transactions on Computers, 39(3):349-359, March 1990.

Digital Library

[25]

Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. Multisealar Processors. In Proceedings of the 22nd Annual international Symposium on Computer Architecture, pages 414--425, June 1995.

Digital Library

[26]

Standard Performance Evaluation Corporation. SPEC Newsletter, Fairfax, VA, September 1995.

[27]

Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392--403, June 1995.

Digital Library

[28]

David A. Wood and Mark D. Hill. Cost-Effective Parallel Computing. IEEE Computer, 28(2):69-72, February 1995.

Digital Library

Cited By

Rahul Kumar Dr. L. K. Jha (2023)Study of Microprocessor-Based Control DevicesInternational Journal of Scientific Research in Science and Technology10.32628/IJSRST2310016(41-44)Online publication date: 1-Jan-2023
https://doi.org/10.32628/IJSRST2310016
Akturk IKarpuzcu U(2017)AMNESIACACM SIGARCH Computer Architecture News10.1145/3093337.303774145:1(811-824)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093337.3037741
Akturk IKarpuzcu U(2017)AMNESIACACM SIGPLAN Notices10.1145/3093336.303774152:4(811-824)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093336.3037741
Show More Cited By

Index Terms

DataScalar architectures

Recommendations

DataScalar architectures
Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)

DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. ...
Out-of-order vector architectures
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

Register renaming and out-of-order instruction issue are now commonly used in superscalar processors. These techniques can also be used to significant advantage in vector processors, as this paper shows. Performance is improved and available memory ...
DataScalar: a memory-centric approach to computing
Special double issue on microprocessor architecture

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture

June 1997

350 pages

ISBN:0897919017

DOI:10.1145/264107

Chairmen:
Andrew R. Pleszkun
Univ. of Colorado-Boulder, CO
,
Trevor Mudge
Univ. of Michigan

ACM SIGARCH Computer Architecture News Volume 25, Issue 2
Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)
May 1997
349 pages
ISSN:0163-5964
DOI:10.1145/384286
Editors:
Andrew R. Pleszkun
Univ. of Colorado-Boulder, CO
,
Trevor Mudge
Univ. of Michigan
Issue’s Table of Contents

Copyright © 1997 Authors.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA97

Sponsor:

SIGARCH

ISCA97: International Conference on Computer Architecture

June 1 - 4, 1997

Colorado, Denver, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
614
Total Downloads

Downloads (Last 12 months)129
Downloads (Last 6 weeks)28

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rahul Kumar Dr. L. K. Jha (2023)Study of Microprocessor-Based Control DevicesInternational Journal of Scientific Research in Science and Technology10.32628/IJSRST2310016(41-44)Online publication date: 1-Jan-2023
https://doi.org/10.32628/IJSRST2310016
Akturk IKarpuzcu U(2017)AMNESIACACM SIGARCH Computer Architecture News10.1145/3093337.303774145:1(811-824)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093337.3037741
Akturk IKarpuzcu U(2017)AMNESIACACM SIGPLAN Notices10.1145/3093336.303774152:4(811-824)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093336.3037741
Akturk IKarpuzcu U(2017)AMNESIACACM SIGOPS Operating Systems Review10.1145/3093315.303774151:2(811-824)Online publication date: 4-Apr-2017
https://doi.org/10.1145/3093315.3037741
Akturk IKarpuzcu UChen YTemam OCarter J(2017)AMNESIACProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3037697.3037741(811-824)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3037697.3037741
Najaf-abadi HRotenberg E(2007)Architectural contestingACM SIGARCH Computer Architecture News10.1145/1294313.129432135:3(28-35)Online publication date: 1-Jun-2007
https://dl.acm.org/doi/10.1145/1294313.1294321
Aggarwal AFranklin M(2005)Instruction Replication for Reducing Delays Due to Inter-PE Communication LatencyIEEE Transactions on Computers10.1109/TC.2005.19754:12(1496-1507)Online publication date: 1-Dec-2005
https://dl.acm.org/doi/10.1109/TC.2005.197
Hughes CAdve S(2005)Memory-side prefetching for linked data structures for processor-in-memory systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2004.11.00465:4(448-463)Online publication date: 1-Apr-2005
https://dl.acm.org/doi/10.1016/j.jpdc.2004.11.004
Srinivasan SRajwar RAkkary HGandhi AUpton M(2004)Continual flow pipelinesACM SIGOPS Operating Systems Review10.1145/1037949.102440738:5(107-119)Online publication date: 7-Oct-2004
https://dl.acm.org/doi/10.1145/1037949.1024407
Srinivasan SRajwar RAkkary HGandhi AUpton M(2004)Continual flow pipelinesACM SIGARCH Computer Architecture News10.1145/1037947.102440732:5(107-119)Online publication date: 7-Oct-2004
https://dl.acm.org/doi/10.1145/1037947.1024407
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten