skip to main content
10.1145/279358.279386acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Using prediction to accelerate coherence protocols

Published: 16 April 1998 Publication History

Abstract

Most large shared-memory multiprocessors use directory protocols to keep per-processor caches coherent. Some memory references in such systems, however, suffer long latencies for misses to remotely-cached blocks. To ameliorate this latency, researchers have augmented standard coherence protocols with optimizations for specific sharing patterns, such as read-modify-write, producer-consumer, and migratory sharing. This paper seeks to replace these directed solutions with general prediction logic that monitors coherence activity and triggers appropriate coherence actions.This paper takes the first step toward using general prediction to accelerate coherence protocols by developing and evaluating the Cosmos coherence message predictor. Cosmos predicts the source and type of the next coherence message for a cache block using logic that is an extension of Yeh and Patt's two-level PAp branch predictor. For five scientific applications running on 16 processors, Cosmos has prediction accuracies of 62% to 93%. Cosmos' high prediction accuracy is a result of predictable coherence message signatures that arise from stable sharing patterns of cache blocks.

References

[1]
Hazim AbdeI-Shafi, Jonathan Hall, Sarita V. Adve, and Vikram S. Adve. An Evaluation of Fine-Grain Producer-lnitiated Communication in Cache-Coherent Multiprocessors. In Proceedings of the Third IEEE Symposium on High-Performance Computer Architecture, pages 204-215, 1997.
[2]
Sarita V. Adve and Kourosh Gharachorloo. Shared Memory Consistency Models: A Tutorial. IEEE Computer, 29(12):66--76, December 1996.
[3]
Anant Agarwal, Richard Simoni, Mark Horowitz, and John Hennessy. An Evaluation of Directory Schemes for Cache Coherence. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 280-289, 1988.
[4]
Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, and Willy Zwaenepoel. Software DSM Protocols that Adapt between Single Writer and Multiple Writer. in Proceedings of the Third IEEE Symposium on High-Performance Computer Architecture, pages 261-27 I, 1997.
[5]
Jean-Loup Baer and Tien-Fu Chen. An Effective Preloading Scheme to Reduce Data Access Penalty. In Proceedings of Supercomputing '91, pages 176-186, 1991.
[6]
David Bailey, John Barton, Thomas Lasinski, and Horst Simon. The NAS Parallel Benchmarks. Technical Report RNR-91-002 Revision 2, Ames Research Center, August 1991.
[7]
John K. Bennett, John B. Carter, and Willy Zwaenepoel. Adaptive Software Cache Management for Distributed Shared Memory. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 125-134, June 1990.
[8]
B.R. Brooks, R.E. Bruccoleri, B.D. Olafson. D.J. States, S'Swamintathan' and M. Karplus. Charmm: A program for macromolecular energy, minimization, and dynamics calculation. Journal of Computational Chemistry, 4(187), 1983.
[9]
Doug Burger and Sanjay Mehta. Parallelizing Appbt for a Shared- Memory Multiprocessor. Technical Report 1286, Computer Sciences Department, University of Wisconsin-Madison, September 1995.
[10]
David Chaiken, John Kubiatowicz, and Anant Agarwal. LimitLESS Directories: A Scalable Cache Coherence Scheme. In Proceedings of the Fourth hzternational Conference on Architectural Support.for Programming Languages and Operating Systems (ASPLOS IV), pages 224-234, April I991.
[11]
Satish Chandra, Brad Richards, and James R. Larus. Teapot: Language Support for Writing Memory Coherence Protocols. In Proceedings of the S1GPLAN '96 Conference on Programming Language Design and implementation (PLDI), May 1996.
[12]
Alan L. Cox and Robert J. Fowler. Adaptive Cache Coherency for Detecting Migratory Shared Data. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 98-108, May 1993.
[13]
Anoop Gupta and Wolf-Dietrich Weber. Cache Invalidation Patterns in Shared-Memory Multiprocessors. IEEE Transactions on Computers, 41 (7):794-810, July 1992.
[14]
Mark D. Hill, James R, Larus, Steven K. Reinhardt, and David A. Wood. Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors. A CM Transactions on Computer Systems, 11(4):300-318, November 1993. Earlier version appeared in it Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V).
[15]
Tor E. Jeremiassen and Susan J. Eggers. Reducing False Sharing on Shared Memory Multiprocessors. In Fifth A CM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 179-188, 1995.
[16]
Teresa L. Johnson and Wen mei Hwu. Run-time Adaptive Cache Hierarchy Management via Reference Analysis. In Proceedings of the 24th Annual International Symposium on ComputerA rchitecture, pages 315-326, 1997.
[17]
Anna R. Karlin, Mark S. Manasse, Larry Rudolph, and Daniel D. Sleator. Competitive Snoopy Caching. Algorithmica, 3:79-119, 1988.
[18]
David Kroft. Lockup-free instruction fetch/prefetch cache organization, in Proceedings of the 8th Annual International Symposium on Computer Architecture, pages 81-87, May 1981.
[19]
James Laudon and Daniel Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 241-25 I, 1997.
[20]
Alvin R. Lebeck and David A. Wood. Dynamic Self-lnvalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 48-59, June 1995.
[21]
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lam. Design of the Stanford DASH Multiprocessor. Technical Report CSL-TR-89- 403, Computer System Laboratory, Stanford University, December 1989.
[22]
Tom Lovett and Rusell Clap. STING: A CC-NUMA Computer System for the Commercial Marketplace. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 308-317, 1996.
[23]
Todd. C Mowry. Tolerating Latency Through Software-Controlled Data Prefetching. PhD thesis, Stanford University, March 1994.
[24]
Shubhendu S. Mukherjee, Babak Falsafi, Mark D. Hill, andDavid A. Wood. Coherent Network Interfaces for Fine-Grain Communication. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 247-258, May 1996.
[25]
ShubhenduS. Mukherjee and Mark D. Hill. An Evaluation of Directory Protocols for Medium-Scale Shared-Memory Multiprocessors. In Proceedings of the 1994 International Conference on Supercomputing, pages 64-74, Manchester, England, July 1994.
[26]
Shubhendu S. Mukherjee, Steven K. Reinhardt, Babak Falsafi. Mike Litzkow, Steve Huss-Lederman, Mark D. Hill, James R. Larus, and David A. Wood. Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator. In Workshop on Performance Analysis and Its Impact on Design (PAID), June 1997.
[27]
Shubhendu S. Mukherjee, Shamik D. Sharrna, Mark D. Hill, James R. Larus, Anne Rogers. and Joel Saltz. Efficient Support for Irregular Applications on Distributed-Memory Machines. In Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 68-79, July 1995.
[28]
Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Detection. Hakan Grahn and Per Stenstrom. Journal of Parallel and Distributed Computing, 39(2):158-180, December 1996.
[29]
Alain Raynaud, Zheng Zhang, and Josep Torrellas. Distance- Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors. In Proceedings of the Second IEEE Symposium on High-Performance Computer Architecture, 1996.
[30]
Steven K. Reinhardt, James R. Larus, and David A. Wood. Tempest and Typhoon: User-Level Shared Memory. In Proceedings ofthe 21st Annual International Symposium on Computer Architecture, pages 325-337, April 1994.
[31]
Jonas Skeppstedt and Per Stenstrom. Simple Compiler Algorithms to Reduce Ownership Overhead in Cache Coherence Protocols. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 286-296, 1994.
[32]
Jonas Skeppstedt and Per Stenstrom. A Compiler Algorithm that Reduces Read Latency in Ownership-Based Cache Coherence Protocols. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 69-78, 1995.
[33]
James E. Smith. A Study of Branch Prediction Strategies. in Proceedings of the 8th Annual International Symposium on Computer Architecture, pages 135-148, 1981.
[34]
IEEE Computer Society. IEEE Standard for Scalable Coherent Interface (SCI), 1992.
[35]
Per Stenstrom, Mats Brorsson, and Lars Sandberg. Adaptive Cache Coherence Protocol Optimized for Migratory Sharing. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 109-118, May 1993.
[36]
Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. Exploiting Choice: instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 191-202, 1997.
[37]
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-36, July 1995.
[38]
David A. Wood, Satish Chandra, Babak Falsafi, Mark D. Hill, James R. Larus, Alvin R. Lebeck, James C. Lewis, Shubhendu S. Mukherjee, Subbarao Palacharla, and Steven K. Reinhardt. Mechanisms for Cooperative Shared Memory. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 156-168, May 1993. Also appeared in it CMG Transactions,/ Spring 1994.
[39]
T-Y Yeh and Yale Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 124-134, 1992.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '98: Proceedings of the 25th annual international symposium on Computer architecture
April 1998
402 pages
ISBN:0818684917
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 26, Issue 3
    Special Issue: Proceedings of the 25th annual international symposium on Computer architecture (ISCA '98)
    June 1998
    379 pages
    ISSN:0163-5964
    DOI:10.1145/279361
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 16 April 1998

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA98
Sponsor:
ISCA98: International Symposium on Computer Architecture
June 27 - July 2, 1998
Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)107
  • Downloads (Last 6 weeks)12
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media