skip to main content
10.1145/3472456.3473510acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

Matryoshka: A Coalesced Delta Sequence Prefetcher

Published: 05 October 2021 Publication History

Abstract

To learn complex memory access patterns effectively, many spatial data prefetchers have been proposed that characterize the patterns as fixed-length delta sequences. However, because complex patterns are variable in workloads, it is difficult for fixed-length delta sequences to recognize them with both high competitive coverage and accuracy. That is, longer delta sequences increase accuracy at a lower probability of pattern matching, while shorter delta sequences increase coverage at a higher probability of false predictions. A classical strategy is to introduce the multiple matching mechanism associated with variable-length delta sequences, but sequences have to be redundantly stored in multiple tables.
We observe that shorter delta sequences can coalesce into longer sequences of a fixed length during learning processes. At the same time, the shorter ones can be extracted from the longer ones during matching processes. By leveraging this property, we can improve the storage efficiency of the multiple matching mechanism. In this paper, we propose a novel low-overhead prefetcher, named Matryoshka, that supports the multiple matching mechanism with high efficiency. Instead of maintaining variable-length delta sequences in multiple tables, Matryoshka coalesces variable-length delta sequences into fixed-length delta sequences, which can be maintained with a single pattern table. Concerning the evaluation of a simulated single-core system using memory-intensive workloads of SPEC 2017, Matryoshka outperforms the state-of-the-art prefetcher IPCP by 6.5% and surpasses SPP+PPF by 2.9% with 26 × lower storage overhead.

References

[1]
[n.d.]. ChampSim simulator. https://github.com/ChampSim/ChampSim
[2]
[n.d.]. Cloudsuite traces. https://www.dropbox.com/sh/pgmnzfr3hurlutq/AACciuebRwSAOzhJkmj5SEXBa/CRC2_trace?dl=0&subfolder_nav_tracking=1
[3]
[n.d.]. Spec cpu 2017 traces (spec speed: 6xx numbered). http://hpca23.cse.tamu.edu/champsim-traces/speccpu/
[4]
2015. The 2nd Data Prefetching Championship. http://comparch-conf.gatech.edu/dpc2/
[5]
2019. The 3rd Data Prefetching Championship. https://dpc3.compas.cs.stonybrook.edu/
[6]
M. Bakhshalipour, M. Shakerinava, P. Lotfi-Kamran, and H. Sarbazi-Azad. 2019. Bingo Spatial Data Prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 399–411.
[7]
E. Bhatia, G. Chacon, S. Pugsley, E. Teran, P. V. Gratz, and D. A. Jiménez. 2019. Perceptron-Based Prefetch Filtering. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 1–13.
[8]
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith. 2006. Stealth prefetching. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA, October 21-25, 2006, John Paul Shen and Margaret Martonosi (Eds.). ACM, 274–282. https://doi.org/10.1145/1168857.1168892
[9]
Chi F. Chen, Se-Hyun Yang, Babak Falsafi, and Andreas Moshovos. 2004. Accurate and Complexity-Effective Spatial Pattern Prediction. In 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 14-18 February 2004, Madrid, Spain. IEEE Computer Society, 276–287. https://doi.org/10.1109/HPCA.2004.10010
[10]
Tien-Fu Chen and Jean-Loup Baer. 1995. Effective Hardware Based Data Prefetching for High-Performance Processors. IEEE Trans. Computers 44, 5 (1995), 609–623. https://doi.org/10.1109/12.381947
[11]
Keith I. Farkas, Paul Chow, Norman P. Jouppi, and Zvonko G. Vranesic. 1997. Memory-System Design Considerations for Dynamically-Scheduled Processors. In Proceedings of the 24th International Symposium on Computer Architecture, Denver, Colorado, USA, June 2-4, 1997, Andrew R. Pleszkun and Trevor N. Mudge (Eds.). ACM, 133–143. https://doi.org/10.1145/264107.264156
[12]
Ibrahim Hur and Calvin Lin. 2006. Memory Prefetching Using Adaptive Stream Detection. In 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 9-13 December 2006, Orlando, Florida, USA. IEEE Computer Society, 397–408. https://doi.org/10.1109/MICRO.2006.32
[13]
Yasuo Ishii, Mary Inaba, and Kei Hiraki. 2009. Access map pattern matching for data cache prefetch. In Proceedings of the 23rd international conference on Supercomputing, 2009, Yorktown Heights, NY, USA, June 8-12, 2009, Michael Gschwind, Alexandru Nicolau, Valentina Salapura, and José E. Moreira (Eds.). ACM, 499–500. https://doi.org/10.1145/1542275.1542349
[14]
Yasuo Ishii, Mary Inaba, and Kei Hiraki. 2011. Access Map Pattern Matching for High Performance Data Cache Prefetch. J. Instr. Level Parallelism 13 (2011). http://www.jilp.org/vol13/v13paper3.pdf
[15]
Teresa L. Johnson, Matthew C. Merten, and Wen-mei W. Hwu. 1997. Run-Time Spatial Locality Detection and Optimization. In Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 30, Research Triangle Park, North Carolina, USA, December 1-3, 1997, Mark Smotherman and Tom Conte (Eds.). ACM/IEEE Computer Society, 57–64. https://doi.org/10.1109/MICRO.1997.645797
[16]
Norman P. Jouppi. 1990. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture, Seattle, WA, USA, June 1990, Jean-Loup Baer, Larry Snyder, and James R. Goodman (Eds.). ACM, 364–373. https://doi.org/10.1145/325164.325162
[17]
J. Kim, S. H. Pugsley, P. V. Gratz, A. L. N. Reddy, C. Wilkerson, and Z. Chishti. 2016. Path confidence based lookahead prefetching. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12.
[18]
Sanjeev Kumar and Christopher B. Wilkerson. 1998. Exploiting Spatial Locality in Data Caches Using Spatial Footprints. In Proceedings of the 25th Annual International Symposium on Computer Architecture, ISCA 1998, Barcelona, Spain, June 27 - July 1, 1998, Mateo Valero, Gurindar S. Sohi, and Doug DeGroot(Eds.). IEEE Computer Society, 357–368. https://doi.org/10.1109/ISCA.1998.694794
[19]
Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, Stavros Volos, Yusuf Onur Koçberber, Javier Picorel, Almutaz Adileh, Djordje Jevdjic, Sachin Idgunji, Emre Özer, and Babak Falsafi. 2012. Scale-out processors. In 39th International Symposium on Computer Architecture (ISCA 2012), June 9-13, 2012, Portland, OR, USA. IEEE Computer Society, 500–511. https://doi.org/10.1109/ISCA.2012.6237043
[20]
Pierre Michaud. 2016. Best-offset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture, HPCA 2016, Barcelona, Spain, March 12-16, 2016. IEEE Computer Society, 469–480. https://doi.org/10.1109/HPCA.2016.7446087
[21]
Celal Öztürk, Ibrahim Burak Karsli, and Resit Sendag. 2014. An analysis of address and branch patterns with PatternFinder. In 2014 IEEE International Symposium on Workload Characterization, IISWC 2014, Raleigh, NC, USA, October 26-28, 2014. IEEE Computer Society, 232–242. https://doi.org/10.1109/IISWC.2014.6983062
[22]
Samuel Pakalapati and Biswabandan Panda. 2019. Bouquet of Instruction Pointers: Instruction Pointer Classifier based Hardware Prefetching. In Proceedings of the 3rd Data Prefetching Championship. https://dpc3.compas.cs.stonybrook.edu/
[23]
Samuel Pakalapati and Biswabandan Panda. 2020. Bouquet of Instruction Pointers: Instruction Pointer Classifier-based Spatial Hardware Prefetching. In 47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020, Valencia, Spain, May 30 - June 3, 2020. IEEE, 118–131. https://doi.org/10.1109/ISCA45697.2020.00021
[24]
Philippos Papaphilippou, Paul H. J. Kelly, and Wayne Luk. 2019. Pangloss: a novel Markov chain prefetcher. CoRR abs/1906.00877(2019). arxiv:1906.00877http://arxiv.org/abs/1906.00877
[25]
Seth H. Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L. Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. 2014. Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers. In 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014, Orlando, FL, USA, February 15-19, 2014. IEEE Computer Society, 626–637. https://doi.org/10.1109/HPCA.2014.6835971
[26]
Andr´e Seznec. 2014. Tage-sc-l branch predictors. In Proceedings of the 4th Championship on Branch Prediction. http://www.jilp.org/cbp2014/
[27]
Andr´e Seznec. 2016. Tage-sc-l branch predictors again. In Proceedings of the 5th Championship on Branch Prediction. http://www.jilp.org/cbp2016/
[28]
André Seznec and Pierre Michaud. 2006. A case for (partially) TAgged GEometric history length branch prediction. J. Instr. Level Parallelism 8 (2006). http://www.jilp.org/vol8/v8paper1.pdf
[29]
Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H. Pugsley, and Zeshan Chishti. 2015. Efficiently prefetching complex address patterns. In Proceedings of the 48th International Symposium on Microarchitecture, MICRO 2015, Waikiki, HI, USA, December 5-9, 2015, Milos Prvulovic (Ed.). ACM, 141–152. https://doi.org/10.1145/2830772.2830793
[30]
Alan Jay Smith. 1978. Sequential Program Prefetching in Memory Hierarchies. Computer 11, 12 (1978), 7–21. https://doi.org/10.1109/C-M.1978.218016
[31]
Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2006. Spatial Memory Streaming. In Proceedings of the 33rd Annual International Symposium on Computer Architecture(ISCA ’06). IEEE Computer Society, USA, 252–263. https://doi.org/10.1109/ISCA.2006.38
[32]
Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N. Patt. 2007. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 10-14 February 2007, Phoenix, Arizona, USA. IEEE Computer Society, 63–74. https://doi.org/10.1109/HPCA.2007.346185
[33]
Hao Wu, Krishnendra Nathella, Joseph Pusdesris, Dam Sunwoo, Akanksha Jain, and Calvin Lin. 2019. Temporal Prefetching Without the Off-Chip Metadata. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. ACM, 996–1008. https://doi.org/10.1145/3352460.3358300
[34]
William A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News 23, 1 (1995), 20–24. https://doi.org/10.1145/216585.216588

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 559
    Total Downloads
  • Downloads (Last 12 months)127
  • Downloads (Last 6 weeks)22
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media