skip to main content
10.5555/3571885.3571992acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

ReSemble: reinforced ensemble framework for data prefetching

Published: 18 November 2022 Publication History

Abstract

Data prefetching hides memory latency by predicting and loading necessary data into cache beforehand. Most prefetchers in the literature are efficient for specific memory address patterns thereby restricting their utility to specialized applications-they do not perform well on hybrid applications with multifarious access patterns. Therefore we propose ReSemble: a Reinforcement Learning (RL) based adaptive enSemble framework that enables multiple prefetchers to complement each other on hybrid applications. Our RL trained ensemble controller takes prefetch suggestions from all prefetchers as input, selects the best suggestion dynamically, and learns online toward getting higher cumulative rewards, which are collected from prefetch hits/misses. Our ensemble framework using a simple multilayer perceptron as the controller achieves on the average 85.27% (accuracy) and 44.22% (coverage), leading to 31.02% IPC improvement, which outperforms state-of-the-art individual prefetchers by 8.35%--26.11%, while also outperforming SBP, a state-of-the-art (non-RL) ensemble prefetcher by 5.69%.

Supplementary Material

MP4 File (SC22_Presentation_Zhang_Pengmiao.mp4)
Presentation at SC '22

References

[1]
W. A. Wulf and S. A. McKee, "Hitting the memory wall: Implications of the obvious," ACM SIGARCH computer architecture news, vol. 23, no. 1, pp. 20--24, 1995.
[2]
C. Carvalho, "The gap between processor and memory speeds," in Proc. of IEEE International Conference on Control and Automation, 2002.
[3]
S. P. Vander Wiel and D. J. Lilja, "When caches aren't enough: Data prefetching techniques," Computer, vol. 30, no. 7, pp. 23--30, 1997.
[4]
D. Joseph and D. Grunwald, "Prefetching using markov predictors," in Proceedings of the 24th annual international symposium on Computer architecture, 1997, pp. 252--263.
[5]
K. J. Nesbit and J. E. Smith, "Data cache prefetching using a global history buffer," in 10th International Symposium on High Performance Computer Architecture (HPCA'04). IEEE, 2004, pp. 96--96.
[6]
P. Michaud, "Best-offset hardware prefetching," in 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2016, pp. 469--480.
[7]
M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. H. Pugsley, and Z. Chishti, "Efficiently prefetching complex address patterns," in 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2015, pp. 141--152.
[8]
A. Jain and C. Lin, "Linearizing irregular memory accesses for improved correlated prefetching," in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013, pp. 247--259.
[9]
S. Somogyi, T. F. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-temporal memory streaming," ACM SIGARCH Computer Architecture News, vol. 37, no. 3, pp. 69--80, 2009.
[10]
S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial memory streaming," ACM SIGARCH Computer Architecture News, vol. 34, no. 2, pp. 252--263, 2006.
[11]
T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos, "Temporal streams in commercial server applications," in 2008 IEEE International Symposium on Workload Characterization. IEEE, 2008, pp. 99--108.
[12]
T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos, "Practical off-chip meta-data for temporal memory streaming," in 2009 IEEE 15th International Symposium on High Performance Computer Architecture, 2009, pp. 79--90.
[13]
H. Wu, K. Nathella, J. Pusdesris, D. Sunwoo, A. Jain, and C. Lin, "Temporal prefetching without the off-chip metadata," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 996--1008.
[14]
M. Bakhshalipour, P. Lotfi-Kamran, and H. Sarbazi-Azad, "Domino temporal data prefetcher," in 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2018, pp. 131--142.
[15]
A. Srivastava, A. Lazaris, B. Brooks, R. Kannan, and V. K. Prasanna, "Predicting memory accesses: the road to compact ml-driven prefetcher," in Proceedings of the International Symposium on Memory Systems, 2019, pp. 461--470.
[16]
A. Srivastava, T.-Y. Wang, P. Zhang, C. A. F. De Rose, R. Kannan, and V. K. Prasanna, "Memmap: Compact and generalizable meta-lstm models for memory access prediction," in Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2020, pp. 57--68.
[17]
P. Zhang, A. Srivastava, B. Brooks, R. Kannan, and V. K. Prasanna, "Raop: Recurrent neural network augmented offset prefetcher," in The International Symposium on Memory Systems, 2020, pp. 352--362.
[18]
P. Zhang, A. Srivastava, T.-Y. Wang, C. A. De Rose, R. Kannan, and V. K. Prasanna, "C-memmap: clustering-driven compact, adaptable, and generalizable meta-lstm models for memory access prediction," International Journal of Data Science and Analytics, pp. 1--14, 2021.
[19]
X. Lu, R. Wang, and X.-H. Sun, "Apac: An accurate and adaptive prefetch framework with concurrent memory access analysis," in 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE, 2020, pp. 222--229.
[20]
H. Devarajan, A. Kougkas, and X.-H. Sun, "Hfetch: Hierarchical data prefetching for scientific workflows in multi-tiered storage environments," in 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2020, pp. 62--72.
[21]
Y. Chen, H. Zhu, H. Jin, and X.-H. Sun, "Algorithm-level feedback-controlled adaptive data prefetcher: Accelerating data access for high-performance processors," Parallel Computing, vol. 38, no. 10--11, pp. 533--551, 2012.
[22]
M. Snir and J. Yu, "On the theory of spatial and temporal locality," Tech. Rep., 2005.
[23]
C. D. Gracia et al., "Ensemble prefetching through classification using support vector machine," in Intelligent Systems Technologies and Applications. Springer, 2016, pp. 261--273.
[24]
S. Rahman, M. Burtscher, Z. Zong, and A. Qasem, "Maximizing hardware prefetch effectiveness with machine learning," in 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, Aug. 2015, pp. 383--389.
[25]
S. Kondguli and M. Huang, "Division of labor: A more effective approach to prefetching," in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018, pp. 83--95.
[26]
S. H. Pugsley, Z. Chishti, C. Wilkerson, P.-f. Chuang, R. L. Scott, A. Jaleel, S.-L. Lu, K. Chow, and R. Balasubramonian, "Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers," in 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2014, pp. 626--637.
[27]
M. F. Uluat and V. İşler, "Ensemble adaptive tile prefetching using fuzzy logic," International Journal of Geographical Information Science, vol. 30, no. 6, pp. 1117--1136, 2016.
[28]
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[29]
R. Bera, K. Kanellopoulos, A. Nori, T. Shahroodi, S. Subramoney, and O. Mutlu, "Pythia: A customizable hardware prefetching framework using online reinforcement learning," in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021, pp. 1121--1137.
[30]
L. Peled, S. Mannor, U. Weiser, and Y. Etsion, "Semantic locality and context-based prefetching using reinforcement learning," in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). IEEE, 2015, pp. 285--297.
[31]
M. Maas, "A taxonomy of ml for systems problems," IEEE Micro, vol. 40, no. 5, pp. 8--16, 2020.
[32]
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "Openai gym," arXiv preprint arXiv:1606.01540, 2016.
[33]
Z. Shi, A. Jain, K. Swersky, M. Hashemi, P. Ranganathan, and C. Lin, "A hierarchical neural model of data prefetching," in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021, pp. 861--873.
[34]
C. J. Watkins and P. Dayan, "Q-learning," Machine learning, vol. 8, no. 3--4, pp. 279--292, 1992.
[35]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[36]
G. Hamerly, E. Perelman, J. Lau, and B. Calder, "Simpoint 3.0: Faster and more flexible program phase analysis," Journal of Instruction Level Parallelism, vol. 7, no. 4, pp. 1--28, 2005.
[37]
A. Jaleel, "Memory characterization of workloads using instrumentation-driven simulation," Web Copy: http://www.glue.umd.edu/ajaleel/workload, 2010.
[38]
S. CPU2017", "The standard performance evaluation corporation," https://www.spec.org/cpu2017/, 2017.
[39]
S. Kumar and C. Wilkerson, "Exploiting spatial locality in data caches using spatial footprints," in Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No. 98CB36235). IEEE, 1998, pp. 357--368.
[40]
S. Byna, Y. Chen, and X.-H. Sun, "Taxonomy of data prefetching for multicore processors," Journal of Computer Science and Technology, vol. 24, no. 3, pp. 405--417, 2009.
[41]
S. Mittal, "A survey of recent prefetching techniques for processor caches," ACM Computing Surveys (CSUR), vol. 49, no. 2, pp. 1--35, 2016.
[42]
M. Bakhshalipour, S. Tabaeiaghdaei, P. Lotfi-Kamran, and H. Sarbazi-Azad, "Evaluation of hardware data prefetchers on server processors," ACM Computing Surveys (CSUR), vol. 52, no. 3, pp. 1--29, 2019.
[43]
J. Kim, S. H. Pugsley, P. V. Gratz, A. N. Reddy, C. Wilkerson, and Z. Chishti, "Path confidence based lookahead prefetching," in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016, pp. 1--12.
[44]
J. Hu, H. Niu, J. Carrasco, B. Lennox, and F. Arvin, "Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning," IEEE Transactions on Vehicular Technology, vol. 69, no. 12, pp. 14413--14 423, 2020.
[45]
R. Bellman, "A markovian decision process," Journal of mathematics and mechanics, vol. 6, no. 5, pp. 679--684, 1957.
[46]
C. Zhang, S. R. Kuppannagari, and V. K. Prasanna, "Maximum entropy model rollouts: Fast model based policy optimization without compounding errors," arXiv preprint arXiv:2006.04802, 2020.
[47]
M. Roderick, J. MacGlashan, and S. Tellex, "Implementing the deep q-network," arXiv preprint arXiv:1711.07478, 2017.
[48]
M. Shakerinava, M. Bakhshalipour, P. Lotfi-Kamran, and H. Sarbazi-Azad, "Multi-lookahead offset prefetching," The Third Data Prefetching Championship, 2019.
[49]
P. Zhang, R. Kannan, A. Nori, and V. Prasanna, "A2p: Attention-based memory access prediction for graph analytics," 01 2022, pp. 135--145.
[50]
"ChampSim", "https://github.com/champsim/champsim," 2017.
[51]
S. Beamer, K. Asanović, and D. Patterson, "The gap benchmark suite," arXiv preprint arXiv:1508.03619, 2015.
[52]
S. McFarling, "Combining branch predictors," Citeseer, Tech. Rep., 1993.
[53]
M. S. Razlighi, M. Imani, F. Koushanfar, and T. Rosing, "Looknn: Neural network with no multiplication," in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, 2017, pp. 1775--1780.
[54]
M. Nazemi, A. Fayyazi, A. Esmaili, A. Khare, S. N. Shahsavani, and M. Pedram, "Nullanet tiny: Ultra-low-latency dnn inference through fixed-function combinational logic," in 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2021, pp. 266--267.
[55]
M. Hashemi, K. Swersky, J. A. Smith, G. Ayers, H. Litz, J. Chang, C. Kozyrakis, and P. Ranganathan, "Learning memory access patterns," arXiv preprint arXiv:1803.02329, 2018.
[56]
K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, "Lstm: A search space odyssey," IEEE transactions on neural networks and learning systems, vol. 28, no. 10, pp. 2222--2232, 2016.
[57]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998--6008.
[58]
P. Zhang, A. Srivastava, A. V. Nori, R. Kannan, and V. K. Prasanna, "Fine-grained address segmentation for attention-based variable-degree prefetching," in Proceedings of the 19th ACM International Conference on Computing Frontiers, 2022, pp. 103--112.
[59]
S.-w. Liao, T.-H. Hung, D. Nguyen, C. Chou, C. Tu, and H. Zhou, "Machine learning-based prefetch optimization for data center applications," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009, pp. 1--10.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2022
1277 pages
ISBN:9784665454445

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 18 November 2022

Check for updates

Badges

Author Tags

  1. ensemble
  2. prefetching
  3. reinforcement learning

Qualifiers

  • Research-article

Conference

SC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 177
    Total Downloads
  • Downloads (Last 12 months)104
  • Downloads (Last 6 weeks)5
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media