skip to main content
10.1145/264107.264158acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

The interaction of software prefetching with ILP processors in shared-memory systems

Published: 01 May 1997 Publication History

Abstract

Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. Recent work has shown that memory latency remains a significant performance bottleneck for shared-memory multiprocessor systems built of such processors.This paper provides the first study of the effectiveness of software-controlled non-binding prefetching in shared memory multiprocessors built of state-of-the-art ILP-based processors. We find that software prefetching results in significant reductions in execution time (12% to 31%) for three out of five applications on an ILP system. However, compared to previous-generation system, software prefetching is significantly less effective in reducing the memory stall component of execution time on an ILP system. Consequently, even after adding software prefetching, memory stall time accounts for over 30% of the total execution time in four out of five applications on our ILP system.This paper also investigates the interaction of software prefetching with memory consistency models on ILP-based multiprocessors. In particular, we seek to determine whether software prefetching can equalize the performance of sequential consistency (SC) and release consistency (RC). We find that even with software prefetching, for three out of five applications, RC provides a significant reduction in execution time (15% to 40%) compared to SC.

References

[1]
H. Abdel-Shafi, J. Hall, S. V. Adve, and V. S. Adve. An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors. In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture, 1997.
[2]
J. E. Bennett and M. J. Flynn. Latency Tolerance for Dynamic Processors. Stanford University, CSL-TR-96-687, 1996.
[3]
J. E. Bennett and M. J. Flynn. Reducing Cache Miss Rates Using Prediction Caches. Stanford University, CSGTR-96- 707,1996.
[4]
D. Callahan, K. Kennedy, and A. Porterfleld. Software Prefetching. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, 1991.
[5]
T.-F. Chen and J.-L. Baer. Reducing Memory Latency via Non-Blocking and Prefetching Caches. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, 1992.
[6]
W. Y. Chen et al. Data Access Microarchitecturesfor Superscalar Processors with Compiler-Assisted Data Prefetching. In Proceedings of the 24th Annual International Symposisum on Microarchitecture, 1991.
[7]
K. Farkas, N. Jouppi, and P. Chow. How Useful are Non- Blocking Loads, Stream Buffers and Speculative Execution in Multiple Issue Processors? In Proceedings of the 1st International Conference on High-Performance Computer Architecture, 1995.
[8]
K. Fletcher. Compiler-hardware cooperation in prefetching for shared-memory multiprocessors. Ph.D. Thesis Proposal, Rice University, September 1995.
[9]
K. Gharachorloo, A. Gupta, and J. Hennessy. Performance Evaluation of Memory Consistency Models for Shared- Memory Multiprocessors. In Proceedings of the 4th Inernational Conference on Architectural Support for Programming Languages and Operating Systems, 1991.
[10]
K. Gharachorloo, A. Gupta, and J. Hennessy. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proceedings of the International Conference on Parallel Processing, 1991.
[11]
K. Gharachorloo, A. Gupta, and J. Hennessy. Hiding Memory Latency Using Dynamic Scheduling in Shared-Memory Multiprocessors. In Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992.
[12]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990.
[13]
E. H. Gomish. Adaptive and Integrated Data Cache Prejetching for Shared-Memory Multiprocessors. PhD thesis, University of Illinois at Urbana-Champaign, 1995.
[14]
A. Gupta, J. Hennessy, K. Gharachorloo, T. Mowry, and W.- D. Weber. ComparativeEvaluationofLatency Reducing and Tolerating Techniques. In Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991.
[15]
M. D. Hill, J. R. Larus, S. K. Reinhardt, and D. A. Wood. Cooperative Shared Memory: Software and Hardware Support for Scalable Multiprocessors. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, 1992.
[16]
A. C. Klaiber and H. M. Levy. An Architecture for Software- Controlled Data Prefetching. In Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991.
[17]
D. Kroft. Lockup-FreeInstructionFetch/Prefetch Cache Organization. In Proceedings of the 8th International Symposium on Computer Architecture, 1981.
[18]
L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Trans. on Computers, C-28(9):690-691, 1979.
[19]
C.-K. Luk and T. C. Mowry. Compiler-Based Prefetchlng for Recursive Data Structures. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, 1996.
[20]
N. McIntosh. Private communication. Rice University, February 1997.
[21]
N. McIntosh, K. Fletcher, K. Cooper, and K. Kennedy. Compiler Techniques for Software Prefetchingon Cache-Cohorent Shared-Memory Multiprocessors. Center for Research on Parallel Computation, Rice University, CRPC-TR9667G-S, 1997.
[22]
MIPS Technologies, Inc. RIO000 Microprocessor User's Manual, Version 1.1, 1996.
[23]
T. Mowry. Tolerating Latency through Software-Controlled Data Prejetching. PhD thesis, Stanford University, 1994.
[24]
T. Mowry and A. Gupta. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors. Journal of Parallel and Distributed Computing, 12(2):87-106,199l.
[25]
T. C. Mowry, M. S. Lam, and A. Gupta. Design and Evaluation of a Compiler Algorithm for Prefetching. In Proceedings ojthe 5th Infernational Conference on Architectural Support for Programming Languages and Operating Systems, 1992.
[26]
V. S. Pai, P. Ranganathan, and S. V. Adve. RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessorsand Uniprocessors. In Proceedings of the 3rd Workshop on Computer Architectum Education, 1997.
[27]
V. S. Pai, P. Ranganathan, and S. V. Adve. The Impact of Instruction Level Parallelism on Multiprocessor Performance and Simulation Methodology. In Proceedinga of the 3rd International Symposium on High Performance Computer Architecture, 1997.
[28]
V. S. Pal, P. Rangenathan, S. V. Adve, and T. Harton. An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP Processors. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, 1996.
[29]
D. Poulsen. Memory Latency Reduction via Data Prefetching and Data Forwarding in Shared-Memory Multiprocessors. PhD thesis, University of Illinois at Urbana- Champaign, 1994.
[30]
M. Rosenblum, E. Bugnion, S. A. Herrod, E. Witchol, and A. Gupta. The Impact of Architectural Trends on Operating System Performance. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, 1995.
[31]
J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared-Memory. Computer Architecture News, 20(1):5-44, 1992.
[32]
SparcInternational. The SPARC Architecture Manual, version 9, 1993.
[33]
D. Tullsen and S. Eggers. Effective Cache Prefetching on Bus-Based Multiprocessors. ACM l?ansactions on Computer Systems, 13(1):57-88,1995.
[34]
S. C. Woo, M. Ohara, E. Torrle, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterizntion and Mothodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture
June 1997
350 pages
ISBN:0897919017
DOI:10.1145/264107
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 25, Issue 2
    Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)
    May 1997
    349 pages
    ISSN:0163-5964
    DOI:10.1145/384286
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA97
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)28
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media