skip to main content
10.1145/3579170.3579258acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrapidoConference Proceedingsconference-collections
research-article

An Analytical Model of Configurable Systolic Arrays to find the Best-Fitting Accelerator for a given DNN Workload

Published: 13 April 2023 Publication History

Abstract

Since their breakthrough, complexity of Deep Neural Networks (DNNs) is rising steadily. As a result, accelerators for DNNs are now used in many domains. However, designing and configuring an accelerator that meets the requirements of a given application perfectly is a challenging task. In this paper, we therefore present our approach to support the accelerator design process. With an analytical model of a systolic array we can estimate performance, energy consumption and area for each design option. To determine these metrics, usually a cycle accurate simulation is performed, which is a time-consuming task. Hence, the design space has to be restricted heavily. Analytical modelling, however, allows for fast evaluation of a design using a mathematical abstraction of the accelerator. For DNNs, this works especially well since the dataflow and memory accesses have high regularity. To show the correctness of our model, we perform an exemplary realization with the state-of-the-art systolic array generator Gemmini and compare it with a cycle accurate simulation and state-of-the-art modelling tools, showing less than 1% deviation. We also conducted a design space exploration, showing the analytical model’s capabilities to support an accelerator design. In a case study on ResNet-34, we can demonstrate that our model and DSE tool reduces the time to find the best-fitting solution by four or two orders of magnitude compared to a cycle-accurate simulation or state-of-the-art modelling tools, respectively.

References

[1]
Alon Amid 2020. Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs. IEEE Micro 40, 4 (2020), 10–21. https://doi.org/10.1109/MM.2020.2996616
[2]
Steffen Baehr 2019. Low Latency Neural Networks using Heterogenous Resources on FPGA for the Belle II Trigger. arXiv:1910.13679 [hep-ex, physics:physics] (Oct 2019). http://arxiv.org/abs/1910.13679
[3]
Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2018. Eyeriss v2: A flexible and high-performance accelerator for emerging deep neural networks. CoRR abs/1807.07928(2018). http://arxiv.org/abs/1807.07928
[4]
Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan. 2017), 127–138. https://doi.org/10.1109/JSSC.2016.2616357
[5]
Hasan Genc 2021. Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration. In 2021 58th ACM/IEEE Design Automation Conference (DAC). 769–774. https://doi.org/10.1109/DAC18074.2021.9586216
[6]
Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:1510.00149 [cs] (Feb 2016). http://arxiv.org/abs/1510.00149
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://doi.org/10.48550/ARXIV.1512.03385
[8]
Tim Hotfilter, Julian Hoefer, Fabian Kreß, Fabian Kempf, and Juergen Becker. 2021. FLECSim-SoC: A Flexible End-to-End Co-Design Simulation Framework for System on Chips. In 2021 IEEE 34th International System-on-Chip Conference (SOCC). 83–88. https://doi.org/10.1109/SOCC52499.2021.9739212
[9]
Angshuman Parashar 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 304–315. https://doi.org/10.1109/ISPASS.2019.00042
[10]
Ananda Samajdar, Yuhao Zhu, Paul N. Whatmough, Matthew Mattina, and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN accelerator. CoRR abs/1811.02883(2018). http://arxiv.org/abs/1811.02883
[11]
Yakun Sophia Shao 2019. Simba: Scaling deep-learning inference with multi-chip-module-based architecture. In Proceedings of the 52nd annual IEEE/ACM international symposium on microarchitecture(MICRO ’52). Association for Computing Machinery, New York, NY, USA, 14–27. https://doi.org/10.1145/3352460.3358302
[12]
Iris Walter 2021. Embedded Face Recognition for Personalized Services in the Assistive Robotics. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer International Publishing, Cham, 339–350.
[13]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Communications of The Acm 52, 4 (April 2009), 65–76. https://doi.org/10.1145/1498765.1498785
[14]
Yannan Nellie Wu, Joel S. Emer, and Vivienne Sze. 2019. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–8. https://doi.org/10.1109/ICCAD45719.2019.8942149

Cited By

View all
  • (2024)EMDRIVE Architecture: Embedded Distributed Computing and Diagnostics from Sensor to Edge2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546796(1-6)Online publication date: 25-Mar-2024
  • (2024)Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN acceleratorsJournal of Systems Architecture10.1016/j.sysarc.2024.103292157(103292)Online publication date: Dec-2024
  • (2024)Flip-and-PatchMicroprocessors & Microsystems10.1016/j.micpro.2024.105023106:COnline publication date: 1-Apr-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
RAPIDO '23: Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems
January 2023
94 pages
ISBN:9798400700453
DOI:10.1145/3579170
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 April 2023

Check for updates

Author Tags

  1. Analytical Modelling
  2. Design Space Exploration
  3. Neural Networks

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • German Federal Ministry of Education and Research (BMBF)

Conference

DroneSE and RAPIDO 2023

Acceptance Rates

Overall Acceptance Rate 14 of 28 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)EMDRIVE Architecture: Embedded Distributed Computing and Diagnostics from Sensor to Edge2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546796(1-6)Online publication date: 25-Mar-2024
  • (2024)Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN acceleratorsJournal of Systems Architecture10.1016/j.sysarc.2024.103292157(103292)Online publication date: Dec-2024
  • (2024)Flip-and-PatchMicroprocessors & Microsystems10.1016/j.micpro.2024.105023106:COnline publication date: 1-Apr-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media