research-article

Public Access

Machine Learning for Fine-Grained Hardware Prefetcher Control

Authors:

Laura E. Brown,

Zhenlin WangAuthors Info & Claims

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

Article No.: 3, Pages 1 - 9

https://doi.org/10.1145/3337821.3337854

Published: 05 August 2019 Publication History

Abstract

Modern architectures provide hardware memory prefetching capabilities which can be configured at runtime. While hardware prefetching can provide substantial performance improvements for many programs, prefetching can also increase contention for shared resources such as last-level cache and memory bandwidth. In turn, this contention can degrade performance in multi-core workloads. In this paper, we model fine-grained hardware prefetcher control as a contextual bandit, and propose a framework for learning prefetcher control policies which adjust hardware prefetching usage at runtime according to workload performance behavior. We train our policies on profiling data, wherein hardware memory prefetchers are enabled or disabled randomly at regular intervals over the course of a workload's execution. The learned prefetcher control policies provide up to a 4.3% average performance improvement over a set of memory bandwidth intensive workloads.

References

[1]

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2003. The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput. 32, 1 (2003), 48--77.

Digital Library

[2]

Alina Beygelzimer and John Langford. 2009. The Offset Tree for Learning with Partial Labels. In Proceedings of 15th International Conference on Knowledge Discovery and Data Mining (KDD '09). 129--38.

Digital Library

[3]

Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.

Digital Library

[4]

Eiman Ebrahimi, Onur Mutlu, Chang Joo Lee, and Yale N. Patt. 2009. Coordinated Control of Multiple Prefetchers in Multi-core Systems. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). 316--326.

Digital Library

[5]

John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Computer Architecture News 34, 4 (September 2006), 1--17.

Digital Library

[6]

Jason Hiebel, Laura E. Brown, and Zhenlin Wang. 2018. Constructing Dynamic Policies for Paging Mode Selection. In 47th International Conference on Parallel Processing (ICPP '18).

Digital Library

[7]

Intel 2016. Intel 64 and IA-32 Architectures Developer's Manual: Volume 3C. Intel.

[8]

Victor Jiménez, Alpher Buyuktosunoglu, Pradip Bose, Francis P. O'Connell, Francisco Cazorla, and Mateo Valero. 2015. Increasing multicore system efficiency through intelligent bandwidth shifting. In 21st International Symposium on High Performance Computer Architecture (HPCA). 39--50.

[9]

Victor Jiménez, Roberto Gioiosa, Francisco J. Cazorla, Alper Buyuktosunoglu, Pradip Bose, and Francis P. O'Connell. 2012. Making Data Prefetch Smarter: Adaptive Prefetching on POWER7. In 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12). 137--146.

Digital Library

[10]

Hui Kang and Jennifer L. Wong. 2013. To Hardware Prefetch or Not to Prefetch?: A Virtualized Environment Study and Core Binding Approach. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). 357--368.

Digital Library

[11]

Rebecca Killick, Paul Fearnhead, and I.A. Eckley. 2012. Optimal Detection of Changepoints With a Linear Computational Cost. 107 (2012), 1590--1598.

[12]

John Langford, Alexander Strehl, and Jennifer Wortman. 2008. Exploration Scavenging. In Proceedings of the 25th International Conference on Machine Learning (ICML '08). 528--535.

Digital Library

[13]

John Langford and Tong Zhang. 2007. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits. In Advances in Neural Information Processing Systems 20 (NIPS). 817--824.

Digital Library

[14]

Shih-wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chiaheng Tu, and Hucheng Zhou. 2009. Machine Learning-based Prefetch Optimization for Data Center Applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC 09). 1--10.

Digital Library

[15]

Tan T. Nguyen and Scott Sanner. 2013. Algorithms for Direct 0-1 Loss Optimization in Binary Classification. In Proceedings of the 30th International Conference on Machine Learning (ICML '13). 1085--1093.

Digital Library

[16]

Cristobal Ortega, Miquel Moreto, Marc Casas, Ramon Bertran, Alper Buyuktosunoglu, Alexandre E. Eichenberger, and Pradip Bose. 2017. libPRISM: An Intelligent Adaptation of Prefetch and SMT Levels. In Proceedings of the International Conference on Supercomputing (ICS '17). 28:1--28:10.

Digital Library

[17]

Saami Rahman, Martin Burtscher, Ziliang Zong, and Apan Qasem. 2015. Maximizing Hardware Prefetch Effectiveness with Machine Learning. In Proceedings of the 17th International Conference on High Performance Computing and Communications. 383--389.

Digital Library

[18]

B. Sinharoy, R. Kalla, W. J. Starke, H. Q. Le, R. Cargnoni, J. A. Van Norstrand, B. J. Ronchetti, J. Stuecheli, J. Leenstra, G. L. Guthrie, D. Q. Nguyen, B. Blaner, C. F. Marino, E. Retter, and P. Williams. 2011. IBM POWER7 Multicore Server Processor. IBM Journal of Research and Development (2011), 191--219.

Digital Library

[19]

SPEC. 2017. SPEC CPU 2017 Home Page. http://www.spec.org/cpu2017. (2017).

[20]

Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N. Patt. 2007. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In 13th International Symposium on High Performance Computer Architecture (HPCA 13). 63--74.

Digital Library

[21]

Vish Viswanathan. 2014. Disclosure of H/W Prefetcher Control on some Intel Processors. Technical Report. Intel.

[22]

Carole-Jean Wu and Margaret Martonosi. 2011. Characterization and Dynamic Mitigation of Intra-application Cache Interference. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '11). 2--11.

Digital Library

[23]

Bianca Zadrozny, John Langford, and Naoki Abe. 2003. Cost-Sensitive Learning by Cost-Proportionate Example Weighting. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM '03). 435--442.

Digital Library

Cited By

Huang LYan LWu T(2024)Hardware Prefetching Tuning Method Based on Program Phase BehaviorJournal of Circuits, Systems and Computers10.1142/S0218126624501585Online publication date: 28-Mar-2024
https://doi.org/10.1142/S0218126624501585
Alcorta EMadhav MAfoakwa RTetrick SYadwadkar NGerstlauer A(2024)Characterizing Machine Learning-Based Runtime Prefetcher SelectionIEEE Computer Architecture Letters10.1109/LCA.2024.340488723:2(146-149)Online publication date: Jul-2024
https://doi.org/10.1109/LCA.2024.3404887
Yang HFang JSu XCai ZWang Y(2024)RL-CoPref: a reinforcement learning-based coordinated prefetching controller for multiple prefetchersThe Journal of Supercomputing10.1007/s11227-024-05938-980:9(13001-13026)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s11227-024-05938-9
Show More Cited By

Index Terms

Machine Learning for Fine-Grained Hardware Prefetcher Control
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Sequential decision making
      2. Supervised learning
        Cost-sensitive learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Kernel methods
        Support vector machines

Recommendations

Introducing thread criticality awareness in prefetcher aggressiveness control
DATE '14: Proceedings of the conference on Design, Automation & Test in Europe

A single parallel application running on a multi-core system shows sub-linear speedup because of slow progress of one or more threads known as critical threads. Some of the reasons for the slow progress of threads are (1) load imbalance, (2) frequent ...
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches
Special Issue on High-Performance Embedded Architectures and Compilers

The exponential increase in multicore processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with a single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been ...
A PAB-based multi-prefetcher mechanism

Aggressive prefetching mechanisms improve performance of some important applications, but substantially increase bus traffic and "pressure" on cache tag arrays. They may even reduce performance of applications that are not memory bounded. We introduce a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

August 2019

1107 pages

ISBN:9781450362955

DOI:10.1145/3337821

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

ICPP 2019

ICPP 2019: 48th International Conference on Parallel Processing

August 5 - 8, 2019

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
849
Total Downloads

Downloads (Last 12 months)188
Downloads (Last 6 weeks)16

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang LYan LWu T(2024)Hardware Prefetching Tuning Method Based on Program Phase BehaviorJournal of Circuits, Systems and Computers10.1142/S0218126624501585Online publication date: 28-Mar-2024
https://doi.org/10.1142/S0218126624501585
Alcorta EMadhav MAfoakwa RTetrick SYadwadkar NGerstlauer A(2024)Characterizing Machine Learning-Based Runtime Prefetcher SelectionIEEE Computer Architecture Letters10.1109/LCA.2024.340488723:2(146-149)Online publication date: Jul-2024
https://doi.org/10.1109/LCA.2024.3404887
Yang HFang JSu XCai ZWang Y(2024)RL-CoPref: a reinforcement learning-based coordinated prefetching controller for multiple prefetchersThe Journal of Supercomputing10.1007/s11227-024-05938-980:9(13001-13026)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s11227-024-05938-9
Adiletta MFargo FDiamond MAdiletta JFranza OSteely S(2023)A Reinforcement Learning Approach to Optimize Cache Prefetcher Aggressiveness at Run-Time2023 Tenth International Conference on Software Defined Systems (SDS)10.1109/SDS59856.2023.10329059(95-102)Online publication date: 23-Oct-2023
https://doi.org/10.1109/SDS59856.2023.10329059
Yang HFang J(2023)A Fairness-Aware Prefetching Mechanism based on Reinforcement Learning for Multi-Core Systems2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00092(639-646)Online publication date: 17-Dec-2023
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00092
Ghosh SSahula VBhargava L(2023)Reinforcement Learning Based Prefetch-Control Mechanism2023 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)10.1109/APCCAS60141.2023.00035(110-114)Online publication date: 19-Nov-2023
https://doi.org/10.1109/APCCAS60141.2023.00035
Fang JXu YKong HCai M(2023)A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architectureThe Journal of Supercomputing10.1007/s11227-023-05078-679:10(10570-10588)Online publication date: 11-Feb-2023
https://doi.org/10.1007/s11227-023-05078-6
Eris FLouis MEris KAbellán JJoshi A(2022)Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory HierarchyACM Transactions on Architecture and Code Optimization10.1145/357030420:1(1-25)Online publication date: 16-Dec-2022
https://dl.acm.org/doi/10.1145/3570304
TehraniJamsaz APopov MDutta ASaillard EJannesari A(2022)Learning Intermediate Representations using Graph Neural Networks for NUMA and Prefetchers Optimization2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00120(1206-1216)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00120
Sánchez Barrera IBlack-Schaffer DCasas MMoretó MStupnikova APopov MAyguadé EHwu WBadia RHofstee H(2020)Modeling and optimizing NUMA effects and prefetching with machine learningProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392765(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392765

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents