research-article

Public Access

SimNet: Accurate and High-Performance Computer Architecture Simulation using Deep Learning

Authors:

Santosh Pandey,

Adolfy HoisieAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 6, Issue 2

Article No.: 25, Pages 1 - 24

https://doi.org/10.1145/3530891

Published: 06 June 2022 Publication History

Abstract

While cycle-accurate simulators are essential tools for architecture research, design, and development, their practicality is limited by an extremely long time-to-solution for realistic applications under investigation. This work describes a concerted effort, where machine learning (ML) is used to accelerate microarchitecture simulation. First, an ML-based instruction latency prediction framework that accounts for both static instruction properties and dynamic processor states is constructed. Then, a GPU-accelerated parallel simulator is implemented based on the proposed instruction latency predictor, and its simulation accuracy and throughput are validated and evaluated against a state-of-the-art simulator. Leveraging modern GPUs, the ML-based simulator outperforms traditional CPU-based simulators significantly.

References

[1]

2020. DGX A100: Universal System for AI Infrastructure. https://www.nvidia.com/en-us/data-center/dgx-a100/

[2]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265--283.

Digital Library

[3]

Mohammad Agbarya, Idan Yaniv, Jayneel Gandhi, and Dan Tsafrir. 2020. Predicting execution times with partial simulations in virtual memory research: why and how. In IEEE/ACM International Symposium on Microarchitecture (MICRO). Global Online Event. (to appear).

[4]

Newsha Ardalani, Clint Lestourgeon, Karthikeyan Sankaralingam, and Xiaojin Zhu. 2015. Cross-Architecture Performance Prediction (XAPP) Using CPU Code to Predict GPU Performance. In Proceedings of the 48th International Symposium on Microarchitecture (Waikiki, Hawaii) (MICRO-48). Association for Computing Machinery, New York, NY, USA, 725--737. https://doi.org/10.1145/2830772.2830780

Digital Library

[5]

I. Baldini, S. J. Fink, and E. Altman. 2014. Predicting GPU Performance from CPU Runs Using Machine Learning. In 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing. 254--261. https://doi.org/10.1109/SBAC-PAD.2014.30 Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 2, Article 25. Publication date: June 2022. 25:22 Lingda Li, Santosh Pandey, Thomas Flynn, Hang Liu, Noel Wheeler, and Adolfy Hoisie

Digital Library

[6]

Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In USENIX annual technical conference, FREENIX Track, Vol. 41. Califor-nia, USA, 46.

[7]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, and et al. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7. https://doi.org/10.1145/2024716.2024718

Digital Library

[8]

James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. 41--42.

Digital Library

[9]

T. E. Carlson, W. Heirman, K. Van Craeynest, and L. Eeckhout. 2014. BarrierPoint: Sampled simulation of multi-threaded applications. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12. https://doi.org/10.1109/ISPASS.2014.6844456

[10]

T. M. Conte, M. A. Hirsch, and K. N. Menezes. 1996. Reducing state loss for effective trace sampling of superscalar processors. In Proceedings International Conference on Computer Design. VLSI in Computers and Processors. 468--477. https://doi.org/10.1109/ICCD.1996.563595

[11]

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey. The Journal of Machine Learning Research 20, 1 (2019), 1997--2017.

Digital Library

[12]

Stijn Eyerman, Kenneth Hoste, and Lieven Eeckhout. 2011. Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware. In (IEEE ISPASS) IEEE International Symposium on Performance Analysis of Systems and Software. 216--226. https://doi.org/10.1109/ISPASS.2011.5762738

[13]

A. Gutierrez, J. Pusdesris, R. G. Dreslinski, T. Mudge, C. Sudanthi, C. D. Emmons, M. Hayenga, and N. Paver. 2014. Sources of error in full-system simulation. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 13--22. https://doi.org/10.1109/ISPASS.2014.6844457

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[15]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[16]

Engin Ïpek, Sally A. McKee, Rich Caruana, Bronis R. de Supinski, and Martin Schulz. 2006. Efficiently Exploring Architectural Design Spaces via Predictive Modeling. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (San Jose, California, USA) (ASPLOS XII). Association for Computing Machinery, New York, NY, USA, 195--206. https://doi.org/10.1145/1168857.1168882

Digital Library

[17]

Aamer Jaleel, Robert S Cohn, Chi-Keung Luk, and Bruce Jacob. 2008. CMPim: A Pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA. 28--36.

[18]

Curtis L Janssen, Helgi Adalsteinsson, Scott Cranford, Joseph P Kenny, Ali Pinar, David A Evensky, and Jackson Mayo. 2010. A simulator for large-scale parallel computer architectures. International Journal of Distributed Systems and Technologies (IJDST) 1, 2 (2010), 57--73.

Digital Library

[19]

Weile Jia, Han Wang, Mohan Chen, Denghui Lu, Lin Lin, Roberto Car, Weinan E, and Linfeng Zhang. 2020. Pushing the Limit of Molecular Dynamics with Ab Initio Accuracy to 100 Million Atoms with Machine Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC '20). IEEE Press, Article 5, 14 pages.

Digital Library

[20]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (Toronto, ON, Canada) (ISCA '17). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3079856.3080246

Digital Library

[21]

Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy Katz, Jonathan Bachrach, and Krste Asanovi?. 2018. Firesim: FPGA-Accelerated Cycle-Exact Scale-out System Simulation in the Public Cloud. In Proceedings of the 45th Annual International Symposium on Computer Architecture (Los Angeles, California) (ISCA '18). IEEE Press, 29--42. https://doi.org/10.1109/ISCA.2018.00014 Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 2, Article 25. Publication date: June 2022. SimNet: Accurate and High-Performance Computer Architecture Simulation using Deep Learning 25:23

Digital Library

[22]

P. Diederik Kingma and Lei Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. international conference on learning representations (2015).

[23]

Yuetsu Kodama, Tetsuya Odajima, Akira Asato, and Mitsuhisa Sato. 2019. Evaluation of the riken post-k processor simulator. arXiv preprint arXiv:1904.06451 (2019).

[24]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097--1105.

[25]

B. C. Lee and D. M. Brooks. 2007. Illustrative Design Space Studies with Microarchitectural Regression Models. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. 340--351. https://doi.org/10.1109/ HPCA.2007.346211

[26]

Benjamin C. Lee, David M. Brooks, Bronis R. de Supinski, Martin Schulz, Karan Singh, and Sally A. McKee. 2007. Methods of Inference and Learning for Performance Modeling of Parallel Applications. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Jose, California, USA) (PPoPP '07). Association for Computing Machinery, New York, NY, USA, 249--258. https://doi.org/10.1145/1229428.1229479

Digital Library

[27]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774.

[28]

Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2019. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. In International Conference on Machine Learning. PMLR, 4505--4515.

[29]

Daniel Nemirovsky, Tugberk Arkose, Nikola Markovic, Mario Nemirovsky, Osman Unsal, and Adrian Cristal. 2017. A Machine Learning Approach for Performance Prediction and Scheduling on Heterogeneous CPUs. In 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). 121--128. https: //doi.org/10.1109/SBAC-PAD.2017.23

[30]

Nikos Nikoleris, Lieven Eeckhout, Erik Hagersten, and Trevor E. Carlson. 2019. Directed Statistical Warming through Time Traveling. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (Columbus, OH, USA) (MICRO '52). Association for Computing Machinery, New York, NY, USA, 1037--1049. https://doi.org/10. 1145/3352460.3358264

[31]

Kenneth O'neal, Philip Brisk, Ahmed Abousamra, Zack Waters, and Emily Shriver. 2017. GPU Performance Estimation Using Software Rasterization and Machine Learning. ACM Trans. Embed. Comput. Syst. 16, 5s, Article 148 (Sept. 2017), 21 pages. https://doi.org/10.1145/3126557

Digital Library

[32]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 8026--8037. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

[33]

A. Patel, F. Afram, S. Chen, and K. Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC). 1050--1055.

[34]

H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi. 2004. Pinpointing Representative Portions of Large Intel ® Itanium ® Programs with Dynamic Instrumentation. In 37th International Symposium on Microarchitecture (MICRO-37'04). 81--92. https://doi.org/10.1109/MICRO.2004.28

Digital Library

[35]

Drew D Penney and Lizhong Chen. 2019. A survey of machine learning applied to computer architecture design. arXiv preprint arXiv:1909.12373 (2019).

[36]

Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for Accurate and Efficient Simulation. SIGMETRICS Perform. Eval. Rev. 31, 1 (June 2003), 318--319.

Digital Library

[37]

Alex Renda, Yishen Chen, Charith Mendis, and Michael Carbin. 2020. DiffTune: Optimizing CPU Simulator Parameters with Learned Differentiable Surrogates. In IEEE/ACM International Symposium on Microarchitecture.

[38]

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533--536.

[39]

Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems. SIGARCH Comput. Archit. News 41, 3 (June 2013), 475--486. https://doi.org/10.1145/2508148.2485963

Digital Library

[40]

A. Sandberg, N. Nikoleris, T. E. Carlson, E. Hagersten, S. Kaxiras, and D. Black-Schaffer. 2015. Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed. In 2015 IEEE International Symposium on Workload Characterization. 183--192. https://doi.org/10.1109/IISWC.2015.29

Digital Library

[41]

M. Sato, Y. Ishikawa, H. Tomita, Y. Kodama, T. Odajima, M. Tsuji, H. Yashiro, M. Aoki, N. Shida, I. Miyoshi, K. Hirai, A. Furuya, A. Asato, K. Morita, and T. Shimizu. 2020. Co-Design for A64FX Manycore Processor and "Fugaku". In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 2, Article 25. Publication date: June 2022. 25:24 Lingda Li, Santosh Pandey, Thomas Flynn, Hang Liu, Noel Wheeler, and Adolfy Hoisie

[42]

Andrew W Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin ídek, Alexander WR Nelson, Alex Bridgland, et al. 2020. Improved protein structure prediction using potentials from deep learning. Nature 577, 7792 (2020), 706--710.

[43]

André Seznec. 2016. TAGE-SC-L Branch Predictors Again. In 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5). Seoul, South Korea. https://hal.inria.fr/hal-01354253

[44]

Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307--317.

[45]

Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 2002. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (San Jose, California) (ASPLOS X). Association for Computing Machinery, New York, NY, USA, 45--57. https://doi.org/10.1145/605397.605403

Digital Library

[46]

Julius O. Smith. 2011. Spectral Audio Signal Processing. https://ccrma.stanford.edu/~jos/sasp online book, 2011 edition.

[47]

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]

Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105--6114. http://proceedings.mlr.press/v97/tan19a.html

[49]

Tran Van Dung, Ittetsu Taniguchi, and Hiroyuki Tomiyama. 2014. Cache Simulation for Instruction Set Simulator QEMU. In 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure Computing. 441--446. https: //doi.org/10.1109/DASC.2014.85

Digital Library

[50]

Han Vanholder. 2016. Efficient inference with tensorrt.

[51]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[52]

T. F. Wenisch, R. E. Wunderlich, M. Ferdman, A. Ailamaki, B. Falsafi, and J. C. Hoe. 2006. SimFlex: Statistical Sampling of Computer System Simulation. IEEE Micro 26, 4 (2006), 18--31. https://doi.org/10.1109/MM.2006.79

Digital Library

[53]

Gene Wu, Joseph L. Greathouse, Alexander Lyashevsky, Nuwan Jayasena, and Derek Chiou. 2015. GPGPU performance and power estimation using machine learning. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 564--576. https://doi.org/10.1109/HPCA.2015.7056063

[54]

Nan Wu and Yuan Xie. 2021. A Survey of Machine Learning for Computer Architecture and Systems. arXiv preprint arXiv:2102.07952 (2021).

[55]

Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (San Diego, California) (ISCA '03). Association for Computing Machinery, New York, NY, USA, 84--97. https://doi.org/10.1145/859618.859629

Digital Library

[56]

Toshio Yoshida. 2018. Fujitsu high performance CPU for the Post-K Computer. In Hot Chips, Vol. 30.

[57]

Xinnian Zheng, Lizy K. John, and Andreas Gerstlauer. 2016. Accurate Phase-Level Cross-Platform Power and Performance Estimation. In Proceedings of the 53rd Annual Design Automation Conference (Austin, Texas) (DAC '16). Association for Computing Machinery, New York, NY, USA, Article 4, 6 pages. https://doi.org/10.1145/2897937.2897977

Digital Library

[58]

Xinnian Zheng, Pradeep Ravikumar, Lizy K John, and Andreas Gerstlauer. 2015. Learning-based analytical crossplatform performance prediction. In 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). IEEE, 52--59

Cited By

Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508552:1(23-24)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3673660.3655085
Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656012
Pandey SYazdanbakhsh ALiu HGaretto MMarin ACiucu FFanti GRighter R(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655085(23-24)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3652963.3655085
Show More Cited By

Index Terms

SimNet: Accurate and High-Performance Computer Architecture Simulation using Deep Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Modeling and simulation
    1. Simulation types and techniques
      1. Discrete-event simulation

Recommendations

SimNet: Accurate and High-Performance Computer Architecture Simulation using Deep Learning
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

While cycle-accurate simulators are essential tools for architecture research, design, and development, their practicality is limited by an extremely long time-to-solution for realistic applications under investigation. This work describes a concerted ...
SimNet: Accurate and High-Performance Computer Architecture Simulation using Deep Learning
SIGMETRICS '22

While cycle-accurate simulators are essential tools for architecture research, design, and development, their practicality is limited by an extremely long time-to-solution for realistic applications under investigation. This work describes a concerted ...
CLBlast: A Tuned OpenCL BLAS Library
IWOCL '18: Proceedings of the International Workshop on OpenCL

This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL routines to accelerate dense linear algebra for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 6, Issue 2

POMACS

June 2022

499 pages

EISSN:2476-1249

DOI:10.1145/3543145

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California
,
Zhi-Li Zhang
University of Minnesota

Issue’s Table of Contents

Copyright © 2022 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022

Published in POMACS Volume 6, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Department of Energy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
917
Total Downloads

Downloads (Last 12 months)508
Downloads (Last 6 weeks)64

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508552:1(23-24)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3673660.3655085
Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656012
Pandey SYazdanbakhsh ALiu HGaretto MMarin ACiucu FFanti GRighter R(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655085(23-24)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3652963.3655085
Piankarnka VLertbumroongchai KPiriyasurawong P(2023)A Digital Painting Learning Model Using Mixed-Reality Technology to Develop Practical Skills in Character Design for AnimationAdvances in Human-Computer Interaction10.1155/2023/52307622023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/5230762
Li TYang RQu GShi GYu CWierman ALow S(2022)Robustness and Consistency in Linear Quadratic Control with Untrusted PredictionsACM SIGMETRICS Performance Evaluation Review10.1145/3547353.352265850:1(107-108)Online publication date: 7-Jul-2022
https://dl.acm.org/doi/10.1145/3547353.3522658
Pandey SLi LFlynn THoisie ALiu H(2022)Scalable Deep Learning-Based Microarchitecture Simulation on GPUsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00084(1-15)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00084

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents