skip to main content
10.1145/3445814.3446752acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

A hierarchical neural model of data prefetching

Published: 17 April 2021 Publication History

Abstract

This paper presents Voyager, a novel neural network for data prefetching. Unlike previous neural models for prefetching, which are limited to learning delta correlations, our model can also learn address correlations, which are important for prefetching irregular sequences of memory accesses. The key to our solution is its hierarchical structure that separates addresses into pages and offsets and that introduces a mechanism for learning important relations among pages and offsets. Voyager provides significant prediction benefits over current data prefetchers. For a set of irregular programs from the SPEC 2006 and GAP benchmark suites, Voyager sees an average IPC improvement of 41.6% over a system with no prefetcher, compared with 21.7% and 28.2%, respectively, for idealized Domino and ISB prefetchers. We also find that for two commercial workloads for which current data prefetchers see very little benefit, Voyager dramatically improves both accuracy and coverage. At present, slow training and prediction preclude neural models from being practically used in hardware, but Voyager’s overheads are significantly lower—in every dimension—than those of previous neural models. For example, computation cost is reduced by 15- 20×, and storage overhead is reduced by 110-200×. Thus, Voyager represents a significant step towards a practical neural prefetcher.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, et al. Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 265-283, 2016.
[2]
Jean-Loup Baer and Tien-Fu Chen. Efective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers, 44 ( 5 ): 609-623, May 1995.
[3]
Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Domino temporal data prefetcher. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 131-142, 2018.
[4]
Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 399-411, 2019.
[5]
Scott Beamer, Krste Asanovi?, and David Patterson. The GAP benchmark suite. arXiv preprint arXiv:1508.03619, 2015.
[6]
Derek Bruening, Timothy Garnett, and Saman Amarasinghe. An infrastructure for adaptive dynamic optimization. In International Symposium on Code Generation and Optimization, 2003. CGO 2003., pages 265-275. IEEE, 2003.
[7]
Doug Burger, Thomas R. Puzak, Wei-Fen Lin, and Steven K. Reinhardt. Filtering superfluous prefetches using density vectors. In Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors (ICCD), pages 124-133, 2001.
[8]
Chi F. Chen, Se-Hyun Yang, Babak Falsafi, and Andreas Moshovos. Accurate and complexity-efective spatial pattern prediction. In Proceedings of the 10th International Symposium on High Performance Computer Architecture (HPCA), pages 276-288, 2004.
[9]
Trishul M. Chilimbi. Eficient representations and abstractions for quantifying and exploiting data reference locality. In SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 191-202, 2001.
[10]
Yuan Chou. Low-cost epoch-based correlation prefetching for commercial applications. In Proceedings of the 40th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), pages 301-313, 2007.
[11]
Keith I. Farkas, Paul Chow, Norman P. Jouppi, and Zvonko Vranesic. Memorysystem design considerations for dynamically-scheduled processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), pages 133-143, 1997.
[12]
Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. Simpoint 3. 0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism, 7 ( 4 ): 1-28, 2005.
[13]
Milad Hashemi, Kevin Swersky, Jamie A Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. arXiv preprint arXiv:1803.02329, 2018.
[14]
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9 ( 8 ): 1735-1780, 1997.
[15]
Zhigang Hu, Margaret Martonosi, and Stefanos Kaxiras. TCP: tag correlating prefetchers. In International Symposium on, High Performance Computer Architecture (HPCA), pages 317-326, 2003.
[16]
Ibrahim Hur and Calvin Lin. Memory prefetching using adaptive stream detection. In Proceedings of the 39th International Symposium on Microarchitecture (MICRO), pages 397-408, 2006.
[17]
Yasuo Ishii, Mary Inaba, and Kei Hiraki. Access map pattern matching for high performance data cache prefetch. Journal of Instruction-Level Parallelism, 13 : 1-24, 2011.
[18]
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geofrey E. Hinton. Adaptive mixtures of local experts. Neural computation, 3 ( 1 ): 79-87, 1991.
[19]
Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 247-259, 2013.
[20]
Akanksha Jain and Calvin Lin. Back to the future: Leveraging belady's algorithm for improved cache replacement. In Proceedings of the International Symposium on Computer Architecture (ISCA), June 2016.
[21]
Aamer Jaleel, Robert S Cohn, Chi-Keung Luk, and Bruce Jacob. Cmp$im: A Pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA, pages 28-36, 2008.
[22]
Daniel A Jiménez. Multiperspective perceptron predictor. In The Journal of Instruction-Level Parallelism 5th JILP Workshop on Computer Architecture Competitions (JWAC-5), Championship Branch Prediction, (co-located with ISCA 2016 ), 2016.
[23]
Daniel A Jiménez and Calvin Lin. Dynamic branch prediction with perceptrons. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA), pages 197-206, 2001.
[24]
Daniel A Jiménez and Elvira Teran. Multiperspective reuse prediction. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 436-448. IEEE, 2017.
[25]
Teresa L. Johnson, Matthew C. Merten, and Wen-Mei W. Hwu. Run-time spatial locality detection and optimization. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), pages 57-64, 1997.
[26]
Doug Joseph and Dirk Grunwald. Prefetching using markov predictors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), pages 252-263, 1997.
[27]
Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch bufers. In International Symposium on Computer Architecture (ISCA), pages 364-373, 1990.
[28]
Samira Khan, Yingying Tian, and Daniel A Jiménez. Sampling dead block prediction for last-level caches. In 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 175-186, 2010.
[29]
Jinchun Kim, Seth H Pugsley, Paul V Gratz, AL Reddy, Chris Wilkerson, and Zeshan Chishti. Path confidence based lookahead prefetching. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture, page 60. IEEE Press, 2016.
[30]
Tim Kraska, Alex Beutel, Ed H. Chi, Jef Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), 2018.
[31]
Sanjeev Kumar and Christopher Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 357-368, 1998.
[32]
Pierre Michaud. Best-ofset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469-480, 2016.
[33]
Pierre Michaud. Best-ofset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469-480, 2016.
[34]
Jinseok Nam, Jungi Kim, Eneldo Loza Mencía, Iryna Gurevych, and Johannes Fürnkranz. Large-scale multi-label text classification-revisiting neural networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 437-452, 2014.
[35]
Kyle J. Nesbit, Ashutosh S. Dhodapkar, and James E. Smith. AC/DC: an adaptive data cache prefetcher. In 13th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 135-145, 2004.
[36]
Kyle J. Nesbit and James E. Smith. Data cache prefetching using a global history bufer. IEEE Micro, 25 ( 1 ): 90-97, 2005.
[37]
Subbarao Palacharla and Richard E. Kessler. Evaluating stream bufers as a secondary cache replacement. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 24-33, April 1994.
[38]
Leeor Peled, Shie Mannor, Uri Weiser, and Yoav Etsion. Semantic locality and context-based prefetching using reinforcement learning. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pages 285-297, 2015.
[39]
Leeor Peled, Uri Weiser, and Yoav Etsion. A neural network prefetcher for arbitrary memory access patterns. ACM Transactions on Architecture and Code Optimization (TACO), page 37, 2019.
[40]
Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.
[41]
Suleyman Sair, Timothy Sherwood, and Brad Calder. A decoupled predictordirected stream prefetching architecture. IEEE Transactions on Computers, 52 ( 3 ): 260-276, March 2003.
[42]
Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H. Pugsley, and Zeshan Chisthi. Eficiently prefetching complex address patterns. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO), pages 141-152, 2015.
[43]
Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. Applying deep learning to the cache replacement problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 413-425, 2019.
[44]
A.J. Smith. Sequential program prefetching in memory hierarchies. IEEE Transactions on Computers, 11 ( 12 ): 7-12, December 1978.
[45]
Yan Solihin, Jaejin Lee, and Josep Torrellas. Using a user-level memory thread for correlation prefetching. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), pages 171-182, 2002.
[46]
Stephen Somogyi, Thomas F. Wenisch, Anastasia Ailamaki, and Babak Falsafi. Spatio-temporal memory streaming. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 69-80, 2009.
[47]
Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. Spatial memory streaming. In Proceedings of the 33th Annual International Symposium on Computer Architecture (ISCA), pages 252-263, 2006.
[48]
Ajitesh Srivastava, Angelos Lazaris, Benjamin Brooks, Rajgopal Kannan, and Viktor K. Prasanna. Predicting memory accesses: The road to compact ml-driven prefetcher. In Proceedings of the International Symposium on Memory Systems (MEMSYS), pages 461-470, 2019.
[49]
Stephen J Tarsa, Chit-Kwan Lin, Gokce Keskin, Gautham Chinya, and Hong Wang. Improving branch prediction by modeling global history with convolutional neural networks. arXiv preprint arXiv:1906.09889, 2019.
[50]
Elvira Teran, Zhe Wang, and Daniel A Jiménez. Perceptron learning for reuse prediction. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1-12, 2016.
[51]
Grigorios Tsoumakas and Ioannis Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM), 3 ( 3 ): 1-13, 2007.
[52]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998-6008, 2017.
[53]
Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Temporal streams in commercial server applications. In IEEE International Symposium on Workload Characterization, pages 99-108, 2008.
[54]
Thomas F Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Practical of-chip meta-data for temporal memory streaming. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture (HPCA), pages 79-90, 2009.
[55]
Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. SHiP: Signature-based hit predictor for high performance caching. In 44th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 430-441, 2011.
[56]
Hao Wu, Krishnendra Nathella, Joseph Pusdesris, Dam Sunwoo, Akanksha Jain, and Calvin Lin. Temporal prefetching without the of-chip metadata. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 996-1008, 2019.
[57]
Hao Wu, Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, and Calvin Lin. Eficient metadata management for irregular data prefetching. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA), pages 449-461, 2019.
[58]
Siavash Zangeneh, Stephen Pruett, Sangkug Lym, and Yale N Patt. Branchnet : A convolutional neural network to predict hard-to-predict branches. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 118-130, 2020.

Cited By

View all

Index Terms

  1. A hierarchical neural model of data prefetching

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
    April 2021
    1090 pages
    ISBN:9781450383172
    DOI:10.1145/3445814
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 April 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Attention Mechanism
    2. Neural Networks
    3. Prefetching

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ASPLOS '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)808
    • Downloads (Last 6 weeks)139
    Reflects downloads up to 07 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media