skip to main content
10.1145/339647.339650acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

A scalable approach to thread-level speculation

Published: 01 May 2000 Publication History

Abstract

While architects understand how to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how to easily create parallel software to effectively exploit all of this raw performance potential. One promising technique for overcoming this problem is Thread-Level Speculation (TLS), which enables the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent. In this paper, we propose and evaluate a design for supporting TLS that seamlessly scales to any machine size because it is a straightforward extension of writeback invalidation-based cache coherence (which itself scales both up and down). Our experimental results demonstrate that our scheme performs well on both single-chip multiprocessors and on larger-scale machines where communication latencies are twenty times larger.

References

[1]
A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques and Tools. Addison Wesley, 1986.]]
[2]
H. Akkary and M. Driscoll. A Dynamic Multithreading Processor. In MICRO-31, December 1998.]]
[3]
C. Amza, S. Dwarkadas Ai. Cox, and W. Zwaenepoel. Software DSM Protocols that Adapt between Single Writer and Multiple Writer. In Proceedings of the Third High Performance Computer Architecture Conference, pages 261-271, February 1997.]]
[4]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. ACM Transactions on Computer Systems, 13(3):205-243, August 1995.]]
[5]
M. Cintra, J. F. Martinez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In Proceedings oflSCA 27, June 2000.]]
[6]
M. Franklin and G. S. Sohi. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References. IEEE Transactions on Computers, 45(5), May 1996.]]
[7]
S. Gopal, T. Vijaykumar, J. Smith, and G. Sohi. Speculative Versioning Cache. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, February 1998.]]
[8]
M. Gupta and R. Nim. Techniques for Speculative Run-Time Parallelization of Loops. In Supercomputing '98, November 1998.]]
[9]
L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In Proceedings ofASPLOS-VIII, October 1998.]]
[10]
J. Kahle. Power4: A Dual-CPU Processor Chip. Microprocessor Forum '99, October 1999.]]
[11]
R Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. Tread- Marks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the Winter Usenix Conference, January 1994.]]
[12]
T. Knight. An Architecture for Mostly Functional Languages. In Proceedings of the ACM Lisp and Functional Programming Conference, pages 500-519, August 1986.]]
[13]
V. Krishnan and J. Torrellas. The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors. In International Conference on Parallel Architectures and Compilation Techniques (PACT), October 1999.]]
[14]
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proceedings of the 24th ISCA, pages 241-251, June 1997.]]
[15]
R Marcuello and A. Gonzlez. Clustered Speculative Multithreaded Processors. In Proc. of the ACM Int. Conf. on Supercomputing, June 1999.]]
[16]
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The Case for a Single-Chip Multiprocessor. In Proceedings of ASPLOS- VII, October 1996.]]
[17]
J. Oplinger, D. Heine, and M. S. Lam. In Search of Speculative Thread-Level Parallelism. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT'99), October 1999.]]
[18]
G. S. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. In Proceedings of ISCA 22, pages 414-425, June 1995.]]
[19]
J. G. Steffan, C. B. Colohan, and T. C. Mowry. Architectural Support for Thread-Level Data Speculation. Technical Report CMU-CS- 97-188, School of Computer Science, Carnegie Mellon University, November 1997.]]
[20]
J. G. Steffan and T. C. Mowry. The Potential for Using Thread- Level Data Speculation to Facilitate Automatic Parallellization. In Proceedings of the Fourth International Symposium on High- Performance Computer Architecture, February 1998.]]
[21]
M. Tremblay. MAJC: Microprocessor Architecture for Java Computing. HotChips '99, August 1999.]]
[22]
J.-Y. Tsai, J. Huang, C. Amlo, D.J. Lilja, and R-C. Yew. The Superthreaded Processor Architecture. IEEE Transactions on Computers, Special Issue on Multithreaded Architectures, 48(9), September 1999.]]
[23]
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In Proceedings oflSCA 22, pages 392-403, June 1995.]]
[24]
K. Yeager. The MIPS R10000 superscalar microprocessor. IEEE Micro, April 1996.]]
[25]
Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors. In Fifth International Symposium on High-Pe~ormance Computer Architecture (HPCA), pages 135-141, January 1999.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
June 2000
327 pages
ISBN:1581132328
DOI:10.1145/339647
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 28, Issue 2
    Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
    May 2000
    325 pages
    ISSN:0163-5964
    DOI:10.1145/342001
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2000

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA00
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)120
  • Downloads (Last 6 weeks)26
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media