research-article

An Analytical Model of Configurable Systolic Arrays to find the Best-Fitting Accelerator for a given DNN Workload

Authors:

Juergen BeckerAuthors Info & Claims

RAPIDO '23: Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pages 73 - 78

https://doi.org/10.1145/3579170.3579258

Published: 13 April 2023 Publication History

Get Access

Abstract

Since their breakthrough, complexity of Deep Neural Networks (DNNs) is rising steadily. As a result, accelerators for DNNs are now used in many domains. However, designing and configuring an accelerator that meets the requirements of a given application perfectly is a challenging task. In this paper, we therefore present our approach to support the accelerator design process. With an analytical model of a systolic array we can estimate performance, energy consumption and area for each design option. To determine these metrics, usually a cycle accurate simulation is performed, which is a time-consuming task. Hence, the design space has to be restricted heavily. Analytical modelling, however, allows for fast evaluation of a design using a mathematical abstraction of the accelerator. For DNNs, this works especially well since the dataflow and memory accesses have high regularity. To show the correctness of our model, we perform an exemplary realization with the state-of-the-art systolic array generator Gemmini and compare it with a cycle accurate simulation and state-of-the-art modelling tools, showing less than 1% deviation. We also conducted a design space exploration, showing the analytical model’s capabilities to support an accelerator design. In a case study on ResNet-34, we can demonstrate that our model and DSE tool reduces the time to find the best-fitting solution by four or two orders of magnitude compared to a cycle-accurate simulation or state-of-the-art modelling tools, respectively.

References

[1]

Alon Amid 2020. Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs. IEEE Micro 40, 4 (2020), 10–21. https://doi.org/10.1109/MM.2020.2996616

Digital Library

Google Scholar

[2]

Steffen Baehr 2019. Low Latency Neural Networks using Heterogenous Resources on FPGA for the Belle II Trigger. arXiv:1910.13679 [hep-ex, physics:physics] (Oct 2019). http://arxiv.org/abs/1910.13679

Google Scholar

[3]

Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2018. Eyeriss v2: A flexible and high-performance accelerator for emerging deep neural networks. CoRR abs/1807.07928(2018). http://arxiv.org/abs/1807.07928

Google Scholar

[4]

Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan. 2017), 127–138. https://doi.org/10.1109/JSSC.2016.2616357

Crossref

Google Scholar

[5]

Hasan Genc 2021. Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration. In 2021 58th ACM/IEEE Design Automation Conference (DAC). 769–774. https://doi.org/10.1109/DAC18074.2021.9586216

Digital Library

Google Scholar

[6]

Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:1510.00149 [cs] (Feb 2016). http://arxiv.org/abs/1510.00149

Google Scholar

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://doi.org/10.48550/ARXIV.1512.03385

Crossref

Google Scholar

[8]

Tim Hotfilter, Julian Hoefer, Fabian Kreß, Fabian Kempf, and Juergen Becker. 2021. FLECSim-SoC: A Flexible End-to-End Co-Design Simulation Framework for System on Chips. In 2021 IEEE 34th International System-on-Chip Conference (SOCC). 83–88. https://doi.org/10.1109/SOCC52499.2021.9739212

Crossref

Google Scholar

[9]

Angshuman Parashar 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 304–315. https://doi.org/10.1109/ISPASS.2019.00042

Crossref

Google Scholar

[10]

Ananda Samajdar, Yuhao Zhu, Paul N. Whatmough, Matthew Mattina, and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN accelerator. CoRR abs/1811.02883(2018). http://arxiv.org/abs/1811.02883

Google Scholar

[11]

Yakun Sophia Shao 2019. Simba: Scaling deep-learning inference with multi-chip-module-based architecture. In Proceedings of the 52nd annual IEEE/ACM international symposium on microarchitecture(MICRO ’52). Association for Computing Machinery, New York, NY, USA, 14–27. https://doi.org/10.1145/3352460.3358302

Digital Library

Google Scholar

[12]

Iris Walter 2021. Embedded Face Recognition for Personalized Services in the Assistive Robotics. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer International Publishing, Cham, 339–350.

Google Scholar

[13]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Communications of The Acm 52, 4 (April 2009), 65–76. https://doi.org/10.1145/1498765.1498785

Digital Library

Google Scholar

[14]

Yannan Nellie Wu, Joel S. Emer, and Vivienne Sze. 2019. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–8. https://doi.org/10.1109/ICCAD45719.2019.8942149

Crossref

Google Scholar

Cited By

View all

Schmidt PTopko IStammler MHarbaum TBecker JBerner RAhmed OJagielski JSeidler TAbel MKreutzer MKirschner MBetancourt VSehm RGroth LNeskovic AMeyer RMulhem SBerekovic MProbst MBrosch MSigl GWild TErnst MHerkersdorf AAigner FHommes SLauer SSeidler MRaste TBozic GCeberio IHassan MMayer A(2024)EMDRIVE Architecture: Embedded Distributed Computing and Diagnostics from Sensor to Edge2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546796(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546796
Toca-Díaz YGran Tejero RValero A(2024)Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN acceleratorsJournal of Systems Architecture10.1016/j.sysarc.2024.103292157(103292)Online publication date: Dec-2024
https://doi.org/10.1016/j.sysarc.2024.103292
Toca-Díaz YHernández Palacios RGran Tejero RValero A(2024)Flip-and-PatchMicroprocessors & Microsystems10.1016/j.micpro.2024.105023106:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.micpro.2024.105023

Index Terms

An Analytical Model of Configurable Systolic Arrays to find the Best-Fitting Accelerator for a given DNN Workload
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Systolic arrays
  2. Embedded and cyber-physical systems
    1. Embedded systems
2. Computing methodologies
  1. Machine learning
  2. Modeling and simulation

Recommendations

A Survey of Design and Optimization for Systolic Array-based DNN Accelerators
In recent years, it has been witnessed that the systolic array is a successful architecture for DNN hardware accelerators. However, the design of systolic arrays also encountered many challenges. As DNN structures and applications become more complex, a ...
Scale-out Systolic Arrays
Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference accelerators. Despite their potential, designing multi-pod systolic arrays to maximize effective throughput/Watt—i.e., throughput/Watt adjusted when accounting for array ...
Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks

The systolic array architecture is one of the most popular choices for convolutional neural network hardware accelerators. The biggest advantage of the systolic array architecture is its simple and efficient design principle. Without complicated control ...

Comments

Information & Contributors

Information

Published In

RAPIDO '23: Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

January 2023

94 pages

ISBN:9798400700453

DOI:10.1145/3579170

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 April 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

German Federal Ministry of Education and Research (BMBF)

Conference

DroneSE and RAPIDO 2023

DroneSE and RAPIDO 2023: System Engineering for constrained embedded systems

January 17 - 18, 2023

Toulouse, France

Acceptance Rates

Overall Acceptance Rate 14 of 28 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
126
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)3

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Schmidt PTopko IStammler MHarbaum TBecker JBerner RAhmed OJagielski JSeidler TAbel MKreutzer MKirschner MBetancourt VSehm RGroth LNeskovic AMeyer RMulhem SBerekovic MProbst MBrosch MSigl GWild TErnst MHerkersdorf AAigner FHommes SLauer SSeidler MRaste TBozic GCeberio IHassan MMayer A(2024)EMDRIVE Architecture: Embedded Distributed Computing and Diagnostics from Sensor to Edge2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546796(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546796
Toca-Díaz YGran Tejero RValero A(2024)Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN acceleratorsJournal of Systems Architecture10.1016/j.sysarc.2024.103292157(103292)Online publication date: Dec-2024
https://doi.org/10.1016/j.sysarc.2024.103292
Toca-Díaz YHernández Palacios RGran Tejero RValero A(2024)Flip-and-PatchMicroprocessors & Microsystems10.1016/j.micpro.2024.105023106:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.micpro.2024.105023

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Cited By

Index Terms

Recommendations

A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

Scale-out Systolic Arrays

Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations