skip to main content
10.1145/3316781.3317813acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System

Published: 02 June 2019 Publication History

Abstract

Various models with Long Short-Term Memory (LSTM) network have demonstrated prior art performances in sequential information processing. Previous LSTM-specific architectures set large on-chip memory for weight storage to alleviate the memory-bound issue and facilitate the LSTM inference in cloud computing. In this paper, E-LSTM is proposed for embedded scenarios with the consideration of the chip-area and limited data-access bandwidth. The heterogeneous hardware in E-LSTM tightly couples an LSTM co-processor with an embedded RISC-V CPU. The eSELL format is developed to represent the sparse weight matrix. With the proposed cell fusion optimization based on the inherent sparsity in computation, E-LSTM achieves up to 2.2× speedup of processing throughput.

References

[1]
Robert Adolf et al. 2016. Fathom: Reference workloads for modern deep learning methods. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--10.
[2]
Banit Agrawal et al. 2007. Guiding architectural sram models. In Computer Design, 2006. ICCD 2006. International Conference on. IEEE, 376--382.
[3]
Kyunghyun Cho et al. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).
[4]
Alex Graves et al. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 6645--6649.
[5]
Klaus Greff et al. 2017. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28, 10 (2017), 2222--2232.
[6]
Yijin Guan et al. 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 629--634.
[7]
Song Han et al. 2017. Ese: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 75--84.
[8]
Karl Moritz Hermann et al. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 1693--1701.
[9]
Moritz Kreutzer et al. 2014. A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM Journal on Scientific Computing 36, 5 (2014), C401--C423.
[10]
Yann LeCun et al. 2010. MNIST handwritten digit database. AT&T Labs {Online}. Available: http://yann.lecun.com/exdb/mnist 2 (2010), 18.
[11]
Stephen Merity et al. 2016. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016).
[12]
Junki Park et al. 2018. Maximizing system performance by balancing computation loads in LSTM accelerators. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018. IEEE, 7--12.
[13]
Ann Taylor et al. 2003. The Penn treebank: an overview. In Treebanks. Springer, 5--22.
[14]
Shuo Wang et al. 2018. C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 11--20.
[15]
W Wen et al. 2018. Learning intrinsic sparse structures within long short-term memory. In International Conference on Learning Representations.
[16]
Xingyao Zhang et al. 2018. Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 162--174.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
June 2019
1378 pages
ISBN:9781450367257
DOI:10.1145/3316781
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Research Grants Council of Hong Kong
  • Croucher Foundation
  • Beijing Natural Science Foundation (No. L172004), Municipal Science and Technology Program

Conference

DAC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media