article

Memory-side prefetching for linked data structures for processor-in-memory systems

Authors:

Christopher J. Hughes,

Sarita V. AdveAuthors Info & Claims

Journal of Parallel and Distributed Computing, Volume 65, Issue 4

Pages 448 - 463

https://doi.org/10.1016/j.jpdc.2004.11.004

Published: 01 April 2005 Publication History

Abstract

This paper studies a memory-side prefetching technique to hide latency incurred by inherently serial accesses to linked data structures (LDS). A programmable engine sits close to memory and traverses LDS independently from the processor. The engine can run ahead of the processor because of its low latency path to memory, allowing it to initiate data transfers earlier than the processor and pipeline multiple transfers over the network. We evaluate the proposed memory-side prefetching scheme for the Olden benchmarks on a processor-in-memory system. For the six benchmarks where LDS memory stall time is significant, the memory-side scheme reduces execution time by an average of 27% compared to a system without any prefetching. Compared to a state-of-the-art processor-side software prefetching scheme, the memory-side scheme reduces execution time in the range of 20-50% for three of the six applications, is about the same for two applications, and is worse by 18% for one application. We conclude that our memory-side scheme is effective, but a combination of the processor- and memory-side prefetching schemes is best and provide a qualitative framework to determine when either scheme should be used.

References

[1]

{1} T. Alexander, G. Kedem, Distributed prefetch-buffer/cache design for high performance memory systems, in: Proceedings of the Second International Symposium on High-Performance Computer Archives, 1996.]]

Digital Library

[2]

{2} M. Bekerman, et al., Correlated load-address predictors, in: Proceedings of the 26th International Symposium on Computer Archives, 1999.]]

Digital Library

[3]

{3} D. Burger, S. Kaxiras, J.R. Goodman, DataScalar architectures, in: Proceedings of the 24th International Symposium on Computer Archives, 1997.]]

Digital Library

[4]

{4} B. Calder, C. Krintz, S. John, T. Austin, Cache-conscious data placement, in: Proceedings of the Eighth International Conference on Archives Support for Programming Languages and Operating Systems, 1998.]]

Digital Library

[5]

{5} M.C. Carlisle, A. Rogers, Software caching and computation migration in Olden, in: Proceedings of the Sixth Principles and Practice of Parallel Programming, 1995.]]

Digital Library

[6]

{6} J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, Impulse: building a smarter memory controller, in: Proceedings of the Fifth International Symposium on High-Performance Computer Archives, 1999.]]

Digital Library

[7]

{7} T.-F. Chen, An effective programmable prefetch engine for on-chip caches, in: Proceedings of the 28th International Symposium on Microarchives, 1995.]]

Digital Library

[8]

{8} T.-F. Chen, J.-L. Baer, Effective hardware-based data prefetching for high-performance processors, IEEE Trans. Comput. (1995).]]

Digital Library

[9]

{9} T.M. Chilimbi, M.D. Hill, J.R. Larus, Cache-conscious structure layout, in: Proceedings of the SIGPLAN'99 Conference on Programming Language Design and Implementation, 1999.]]

Digital Library

[10]

{10} D.G. Elliot, W.M. Snelgrove, Computational ram: a memory-SIMD hybrid and its application to DSP, in: Proceedings of the Custom Integrated Circuits Conference, 1992.]]

[11]

{11} M. Gokhale, B. Holmes, K. Iobst, Processing in memory: the terasys massively parallel PIM array, IEEE Comput. (April 1995).]]

Digital Library

[12]

{12} M. Hall, et al., Mapping irregular applications to DIVA, a PIM-based data-intensive architecture, in: Proceedings of Supercomputing'99, 1999.]]

Digital Library

[13]

{13} C.J. Hughes, Prefetching linked data structures in systems with merged DRAM-logic, Master's Thesis, University of Illinois at Urbana-Champaign, May 2000, URL: http://rsim.cs.uiuc.edu/c~ jhughes/cjhughesmsthesis.pdf.]]

[14]

{14} C.J. Hughes, S. Adve, Supplemental data for memory-side prefetching for linked data structures, URL: http://rsim.cs.uiuc.edu/c~ jhughes/alternative-arch.ps.]]

[15]

{15} IBM, URL: http://www.research.ibm.com/bluegene/comsci.html.]]

[16]

{16} Intel, Intel IA-64 Architecture Software Developer's Manual, 2000.]]

[17]

{17} Intel, IA-32 Intel Architecture Software Developer's Manual, 2001.]]

[18]

{18} B.L. Jacob, T.N. Mudge, A look at several memory management units, TLB-refill mechanisms, and page table organizations, in: Proceedings of the Eighth International Conference on Archives Support for Programming Languages and Operating Systems, 1998.]]

Digital Library

[19]

{19} D. Joseph, D. Grunwald, Prefetching using Markov predictors, in: Proceedings of the 24th International Symposium on Computer Archives, 1997.]]

Digital Library

[20]

{20} Y. Kang, et al., FlexRAM: toward an advanced intelligent memory system, in: Proceedings of the 1999 International Conference on Computer Design, 1999.]]

Digital Library

[21]

{21} M. Karlsson, F. Dahlgren, P. Stenström, A prefetching technique for irregular accesses to linked data structures, in: Proceedings of the Sixth International Symposium on High-Performance Computer Archives, 2000.]]

[22]

{22} P.M. Kogge, The EXECUBE approach to massively parallel processing, in: Proceedings of the 1994 International Conference on Parallel Proceedings, 1994.]]

[23]

{23} N. Kohout, S. Choi, D. Yeung, Multi-chain prefetching: exploiting natural memory parallelism in pointer-chasing codes, Technical Report UMD-SCA-TR-2000-01, University of Maryland at College Park, 2000.]]

[24]

{24} C.E. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanović, N. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas, N. Treuhaft, K. Yelick, Scalable processors in the billion-transistor era: IRAM, IEEE Comput. (September 1997).]]

Digital Library

[25]

{25} D. Kroft, Lockup-free instruction fetch/prefetch cache organization, in: Proceedings of Eighth Symposium on Computer Archives, May 1981, pp. 81-87.]]

Digital Library

[26]

{26} M.H. Lipasti, W.J. Schmidt, S.R. Kunkel, R.R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, in: Proceedings of the 28th International Symposium on Microarchives, 1995.]]

Digital Library

[27]

{27} C.-K. Luk, T.C. Mowry, Compiler-based prefetching for recursive data structures, in: Proceedings of the Seventh International Conference on Archives Support for Programming Languages and Operating Systems, 1996.]]

Digital Library

[28]

{28} C.-K. Luk, T.C. Mowry, Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation, in: Proceedings of the 26th International Symposium on Computer Archives, 1999.]]

Digital Library

[29]

{29} S. Mehrotra, L. Harrison, Examination of a memory access classification scheme for pointer-intensive and numeric programs, in: Proceedings of the 10th International Conference on Supercomputing, 1996.]]

Digital Library

[30]

{30} T.C. Mowry, M.S. Lam, A. Gupta, Design and evaluation of a compiler algorithm for prefetching, in: Proceedings of the Fifth International Conference on Archives Support for Programming Languages and Operating Systems, 1992.]]

Digital Library

[31]

{31} M. Oskin, F.T. Chong, T. Sherwood, Active pages: a computation model for intelligent memory, in: Proceedings of the 25th International Symposium on Computer Archives, 1998.]]

Digital Library

[32]

{32} V.S. Pai, P. Ranganathan, H. Abdel-Shafi, S. Adve, The impact of exploiting instruction-level parallelism on shared-memory multiprocessors, IEEE Trans. Comput. (special issue on caches) (February 1999).]]

Digital Library

[33]

{33} V.S. Pai, P. Ranganathan, S.V. Adve, RSIM Reference Manual version 1.0, Technical Report 9705, Department of Electronics and Computer Engineering, Rice University, August 1997.]]

[34]

{34} S.S. Pinter, A. Yoaz, Tango: a hardware-based data prefetching technique for superscalar processors, in: Proceedings of the 29th International Symposium on Microarchives, 1996.]]

Digital Library

[35]

{35} A. Roth, A. Moshovos, G.S. Sohi, Dependence based prefetching for linked data structures, in: Proceedings of the Eighth International Conference on Archives Support for Programming Languages and Operating Systems, 1998.]]

Digital Library

[36]

{36} A. Roth, G.S. Sohi, Effective jump-pointer prefetching for linked data structures, in: Proceedings of the 26th International Symposium on Computer Archives, 1999.]]

Digital Library

[37]

{37} A. Saulsbury, F. Pong, A. Nowatzyk, Missing the memory wall: the case for processor/memory integration, in: Proceedings of the 23rd International Symposium on Computer Archives, 1996.]]

Digital Library

[38]

{38} S. Wallach, Billions and billions, Fourth International Symposium on High-Performance Computer Archives, 1998, Keynote address.]]

[39]

{39} S.C. Woo, et al., The SPLASH-2 programs: characterization and methodological considerations, in: Proceedings of the 22nd International Symposium on Computer Archives, June 1995, pp. 24-36.]]

Digital Library

[40]

{40} C.-L. Yang, A.R. Lebeck, Push vs. pull: data movement for linked data structures, in: Proceedings of the 2000 International Conference on Supercomputing, May 2000.]]

Digital Library

[41]

{41} Z. Zhang, J. Torrellas, Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching, in: Proceedings of the 22nd International Symposium on Computer Archives, 1995.]]

Digital Library

Cited By

Bakhshalipour MTabaeiaghdaei SLotfi-Kamran PSarbazi-Azad H(2019)Evaluation of Hardware Data Prefetchers on Server ProcessorsACM Computing Surveys10.1145/331274052:3(1-29)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3312740
Rafique MZhu Z(2018)CAMPSProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225112(1-9)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3225058.3225112
Mittal S(2016)A Survey of Recent Prefetching Techniques for Processor CachesACM Computing Surveys10.1145/290707149:2(1-35)Online publication date: 2-Aug-2016
https://dl.acm.org/doi/10.1145/2907071
Show More Cited By

Index Terms

Memory-side prefetching for linked data structures for processor-in-memory systems

Recommendations

Memory-Side Prefetching for Linked Data Structures
Designing a Modern Memory Hierarchy with Hardware Prefetching

In this paper, we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that, even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-...
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Parallel and Distributed Computing

Journal of Parallel and Distributed Computing Volume 65, Issue 4

April 2005

157 pages

ISSN:0743-7315

Issue’s Table of Contents

Copyright © Elsevier Inc. © 2004.

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 April 2005

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bakhshalipour MTabaeiaghdaei SLotfi-Kamran PSarbazi-Azad H(2019)Evaluation of Hardware Data Prefetchers on Server ProcessorsACM Computing Surveys10.1145/331274052:3(1-29)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3312740
Rafique MZhu Z(2018)CAMPSProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225112(1-9)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3225058.3225112
Mittal S(2016)A Survey of Recent Prefetching Techniques for Processor CachesACM Computing Surveys10.1145/290707149:2(1-35)Online publication date: 2-Aug-2016
https://dl.acm.org/doi/10.1145/2907071
Ahn JHong SYoo SMutlu OChoi K(2015)A scalable processing-in-memory accelerator for parallel graph processingACM SIGARCH Computer Architecture News10.1145/2872887.275038643:3S(105-117)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750386
Ahn JHong SYoo SMutlu OChoi KMarr DAlbonesi D(2015)A scalable processing-in-memory accelerator for parallel graph processingProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750386(105-117)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750386
Yedlapalli PKotra JKultursay EKandemir MDas CSivasubramaniam AFensch CO'Boyle MSeznec ABodin F(2013)Meeting midwayProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523761(289-298)Online publication date: 7-Oct-2013
https://dl.acm.org/doi/10.5555/2523721.2523761
Askitis NZobel J(2011)Redesigning the string hash table, burst trie, and BST to exploit cacheACM Journal of Experimental Algorithmics10.1145/1671970.192170415(1.1-1.61)Online publication date: 7-Feb-2011
https://dl.acm.org/doi/10.1145/1671970.1921704
Askitis NSinha R(2010)Engineering scalable, cache and space efficient tries for stringsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-010-0183-919:5(633-660)Online publication date: 1-Oct-2010
https://dl.acm.org/doi/10.1007/s00778-010-0183-9
Cantin JLipasti MSmith J(2006)Stealth prefetchingACM SIGARCH Computer Architecture News10.1145/1168919.116889234:5(274-282)Online publication date: 20-Oct-2006
https://dl.acm.org/doi/10.1145/1168919.1168892
Cantin JLipasti MSmith J(2006)Stealth prefetchingACM SIGPLAN Notices10.1145/1168918.116889241:11(274-282)Online publication date: 20-Oct-2006
https://dl.acm.org/doi/10.1145/1168918.1168892
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents