research-article

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System

Authors:

Hayden K.-H. So,

Yun LiangAuthors Info & Claims

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

Article No.: 182, Pages 1 - 6

https://doi.org/10.1145/3316781.3317813

Published: 02 June 2019 Publication History

Abstract

Various models with Long Short-Term Memory (LSTM) network have demonstrated prior art performances in sequential information processing. Previous LSTM-specific architectures set large on-chip memory for weight storage to alleviate the memory-bound issue and facilitate the LSTM inference in cloud computing. In this paper, E-LSTM is proposed for embedded scenarios with the consideration of the chip-area and limited data-access bandwidth. The heterogeneous hardware in E-LSTM tightly couples an LSTM co-processor with an embedded RISC-V CPU. The eSELL format is developed to represent the sparse weight matrix. With the proposed cell fusion optimization based on the inherent sparsity in computation, E-LSTM achieves up to 2.2× speedup of processing throughput.

References

[1]

Robert Adolf et al. 2016. Fathom: Reference workloads for modern deep learning methods. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--10.

[2]

Banit Agrawal et al. 2007. Guiding architectural sram models. In Computer Design, 2006. ICCD 2006. International Conference on. IEEE, 376--382.

[3]

Kyunghyun Cho et al. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).

[4]

Alex Graves et al. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 6645--6649.

[5]

Klaus Greff et al. 2017. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28, 10 (2017), 2222--2232.

[6]

Yijin Guan et al. 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 629--634.

[7]

Song Han et al. 2017. Ese: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 75--84.

Digital Library

[8]

Karl Moritz Hermann et al. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 1693--1701.

Digital Library

[9]

Moritz Kreutzer et al. 2014. A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM Journal on Scientific Computing 36, 5 (2014), C401--C423.

[10]

Yann LeCun et al. 2010. MNIST handwritten digit database. AT&T Labs {Online}. Available: http://yann.lecun.com/exdb/mnist 2 (2010), 18.

[11]

Stephen Merity et al. 2016. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016).

[12]

Junki Park et al. 2018. Maximizing system performance by balancing computation loads in LSTM accelerators. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018. IEEE, 7--12.

[13]

Ann Taylor et al. 2003. The Penn treebank: an overview. In Treebanks. Springer, 5--22.

[14]

Shuo Wang et al. 2018. C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 11--20.

Digital Library

[15]

W Wen et al. 2018. Learning intrinsic sparse structures within long short-term memory. In International Conference on Learning Representations.

[16]

Xingyao Zhang et al. 2018. Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 162--174.

Cited By

Correia SRoque PMatos-Carvalho J(2024)LSTM Gate Disclosure as an Embedded AI Methodology for Wearable Fall-Detection SensorsSymmetry10.3390/sym1610129616:10(1296)Online publication date: 2-Oct-2024
https://doi.org/10.3390/sym16101296
Nguyen VLe GThinh TNghi H(2024)LaRed: An LSTM Accelerator on RISC-V-Based Edge DevicesIntelligence of Things: Technologies and Applications10.1007/978-3-031-75593-4_13(140-150)Online publication date: 17-Dec-2024
https://doi.org/10.1007/978-3-031-75593-4_13
Sanchez-Flores AAlvarez LAlorda-Ladaria B(2023)Accelerators in Embedded Systems for Machine Learning: A RISCV View2023 38th Conference on Design of Circuits and Integrated Systems (DCIS)10.1109/DCIS58620.2023.10335969(1-6)Online publication date: 15-Nov-2023
https://doi.org/10.1109/DCIS58620.2023.10335969
Show More Cited By

Recommendations

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Neural networks based on Long Short-Term Memory (LSTM) are widely deployed in latency-sensitive language and speech applications. To speed up LSTM inference, previous research proposes weight pruning techniques to reduce computational cost. ...
C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Recently, significant accuracy improvement has been achieved for acoustic recognition systems by increasing the model size of Long Short-Term Memory (LSTM) networks. Unfortunately, the ever-increasing size of LSTM model leads to inefficient designs on ...
Fast Training of Deep LSTM Networks
Advances in Neural Networks – ISNN 2019
Abstract
Deep recurrent neural networks (RNN), such as LSTM, have many advantages over forward networks. However, the LSTM training method, such as backward propagation through time (BPTT), is really slow.
In this paper, by separating the LSTM cell into ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

June 2019

1378 pages

ISBN:9781450367257

DOI:10.1145/3316781

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Research Grants Council of Hong Kong
Croucher Foundation
Beijing Natural Science Foundation (No. L172004), Municipal Science and Technology Program

Conference

DAC '19

Sponsor:

SIGDA

DAC '19: The 56th Annual Design Automation Conference 2019

June 2 - 6, 2019

NV, Las Vegas, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
422
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Correia SRoque PMatos-Carvalho J(2024)LSTM Gate Disclosure as an Embedded AI Methodology for Wearable Fall-Detection SensorsSymmetry10.3390/sym1610129616:10(1296)Online publication date: 2-Oct-2024
https://doi.org/10.3390/sym16101296
Nguyen VLe GThinh TNghi H(2024)LaRed: An LSTM Accelerator on RISC-V-Based Edge DevicesIntelligence of Things: Technologies and Applications10.1007/978-3-031-75593-4_13(140-150)Online publication date: 17-Dec-2024
https://doi.org/10.1007/978-3-031-75593-4_13
Sanchez-Flores AAlvarez LAlorda-Ladaria B(2023)Accelerators in Embedded Systems for Machine Learning: A RISCV View2023 38th Conference on Design of Circuits and Integrated Systems (DCIS)10.1109/DCIS58620.2023.10335969(1-6)Online publication date: 15-Nov-2023
https://doi.org/10.1109/DCIS58620.2023.10335969
Que ZNakahara HFan HLi HMeng JTsoi KNiu XNurvitadhi ELuk W(2022)Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural NetworksACM Transactions on Reconfigurable Technology and Systems10.1145/353496916:1(1-26)Online publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1145/3534969
Que ZNakahara HNurvitadhi EBoutros AFan HZeng CMeng JTsoi KNiu XLuk W(2022)Recurrent Neural Networks With Column-Wise Matrix–Vector Multiplication on FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.313535330:2(227-237)Online publication date: Feb-2022
https://doi.org/10.1109/TVLSI.2021.3135353
Varadharajan SNallasamy V(2021) P‐SCADA ‐ A novel area and energy efficient FPGA architectures for LSTM prediction of heart arrthymias in biot applications Expert Systems10.1111/exsy.1268739:3Online publication date: 26-Mar-2021
https://doi.org/10.1111/exsy.12687
Wiedemann SShivapakash SBecking DWiedemann PSamek WGerfers FWiegand T(2021)FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer PerceptronsIEEE Open Journal of Circuits and Systems10.1109/OJCAS.2021.30833322(407-419)Online publication date: 2021
https://doi.org/10.1109/OJCAS.2021.3083332
Ferianc MQue ZFan HLuk WRodrigues M(2021)Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator2021 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT52863.2021.9609847(1-10)Online publication date: 6-Dec-2021
https://doi.org/10.1109/ICFPT52863.2021.9609847
Que ZWang EMarikar UMoreno ENgadiuba JJaved HBorzyszkowski BAarrestad TLoncar VSummers SPierini MCheung PLuk W(2021)Accelerating Recurrent Neural Networks for Gravitational Wave Experiments2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP52443.2021.00025(117-124)Online publication date: Jul-2021
https://doi.org/10.1109/ASAP52443.2021.00025
Hatcher WQian CGao WLiang FHua KYu W(2021)Towards Efficient and Intelligent Internet of Things Search EngineIEEE Access10.1109/ACCESS.2021.30527599(15778-15795)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3052759
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten