skip to main content
10.1145/2994551.2994564acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article

Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables

Published: 14 November 2016 Publication History

Abstract

Deep learning has revolutionized the way sensor data are analyzed and interpreted. The accuracy gains these approaches offer make them attractive for the next generation of mobile, wearable and embedded sensory applications. However, state-of-the-art deep learning algorithms typically require a significant amount of device and processor resources, even just for the inference stages that are used to discriminate high-level classes from low-level data. The limited availability of memory, computation, and energy on mobile and embedded platforms thus pose a significant challenge to the adoption of these powerful learning techniques. In this paper, we propose SparseSep, a new approach that leverages the sparsification of fully connected layers and separation of convolutional kernels to reduce the resource requirements of popular deep learning algorithms. As a result, SparseSep allows large-scale DNNs and CNNs to run efficiently on mobile and embedded hardware with only minimal impact on inference accuracy. We experiment using SparseSep across a variety of common processors such as the Qualcomm Snapdragon 400, ARM Cortex M0 and M3, and Nvidia Tegra K1, and show that it allows inference for various deep models to execute more efficiently; for example, on average requiring 11.3 times less memory and running 13.3 times faster on these representative platforms.

Supplementary Material

MOV File (p176.mov)

References

[1]
Y. Bengio, I. J. Goodfellow, and A. Courville, "Deep learning," 2015, book in preparation for MIT Press. {Online}. Available: http://www.iro.umontreal.ca/~bengioy/dlbook
[2]
L. Deng and D. Yu, "Deep learning: Methods and applications," Tech. Rep. MSR-TR-2014-21, January 2014. {Online}. Available: http://research.microsoft.com/apps/pubs/default.aspx?id=209355
[3]
G. Hinton, L. Deng, D. Yu, G. Dahl, A. rahman Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," Signal Processing Magazine, 2012.
[4]
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, "Deepface: Closing the gap to human-level performance in face verification," in Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[5]
K. J. Geras, A. Mohamed, R. Caruana, G. Urban, S. Wang, Ö. Aslan, M. Philipose, M. Richardson, and C. A. Sutton, "Compressing lstms into cnns," CoRR, vol. abs/1511.06433, 2015. {Online}. Available: http://arxiv.org/abs/1511.06433
[6]
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '14. New York, NY, USA: ACM, 2014, pp. 269--284. {Online}. Available: http://doi.acm.org/10.1145/2541940.2541967
[7]
N. Hammerla, J. Fisher, P. Andras, L. Rochester, R. Walker, and T. Plötz, "Pd disease state assessment in naturalistic environments using deep learning," in AAAI 2015, 2015.
[8]
N. D. Lane and P. Georgiev, "Can deep learning revolutionize mobile sensing?" in Proceedings of the 16th International Workshop on Mobile Computing Systems and Applications, ser. HotMobile '15. New York, NY, USA: ACM, 2015, pp. 117--122. {Online}. Available: http://doi.acm.org/10.1145/2699343.2699349
[9]
"Your Samsung SmartTV Is Spying on You, Basically," http://www.thedailybeast.com/articles/2015/02/05/your-samsung-smarttv-is-spying-on-you-basically.html.
[10]
"How Google Translate squeezes deep learning onto a phone," http://googleresearch.blogspot.co.uk/2015/07/how-google-translate-squeezes-deep.html.
[11]
G. Chen, C. Parada, and G. Heigold, "Small-footprint keyword spotting using deep neural networks," in IEEE International Conference on Acoustics, Speech, and Signal Processing, ser. ICASSP'14, 2014.
[12]
N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar, "An early resource characterization of deep learning on wearables, smartphones and internet-of-things devices," in Proceedings of the 2015 International Workshop on Internet of Things Towards Applications, ser. IoT-App '15. New York, NY, USA: ACM, 2015, pp. 7--12. {Online}. Available: http://doi.acm.org/10.1145/2820975.2820980
[13]
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. aurelio Ranzato, A. Senior, P. Tucker, K. Yang, Q. V. Le, and A. Y. Ng, "Large scale distributed deep networks," in Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., 2012, pp. 1223--1231.
[14]
N. D. Lane, S. Bhattacharya, C. Forlivesi, P. Georgiev, L. Jiao, L. Qendro, and F. Kawsar, "Deepx: A software accelerator for low-power deep learning inference on mobile devices," in IPSN 2016.
[15]
"Nvidia Tegra K1," http://www.nvidia.com/object/tegra-k1-processor.html.
[16]
"Arm Cortex-M3," http://www.arm.com/products/processors/cortex-m/cortex-m3.php.
[17]
B. A. Olshausen and D. J. Field, "Sparse coding with an overcomplete basis set: A strategy employed by v1?" Vision research, vol. 37, no. 23, pp. 3311--3325, 1997.
[18]
Principal component analysis. Wiley Online Library, 2002.
[19]
C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2007.
[20]
A. Krizhevsky, "One weird trick for parallelizing convolutional neural networks," CoRR, vol. abs/1404.5997, 2014. {Online}. Available: http://arxiv.org/abs/1404.5997
[21]
J. S. Ren and L. Xu, "On vectorization of deep convolutional neural networks for vision tasks," in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
[22]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms, 3rd ed. MIT press, 2009.
[23]
J. Xue, J. Li, and Y. Gong, "Restructuring of deep neural network acoustic models with singular value decomposition," in INTERSPEECH, 2013, pp. 2365--2369.
[24]
T. He, Y. Fan, Y. Qian, T. Tan, and K. Yu, "Reshaping deep neural network for fast decoding by node-pruning," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 245--249.
[25]
H. Lee, A. Battle, R. Raina, and A. Y. Ng, "Efficient sparse coding algorithms," in Neural Information Processing Systems (NIPS), 2007.
[26]
R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, "Self-taught learning: Transfer learning from unlabeled data," in Proceeding of the International Conference on Machine Learning (ICML), 2007.
[27]
S. Bhattacharya, P. Nurmi, N. Hammerla, and T. Plötz, "Using unlabeled data in a sparse-coding framework for human activity recognition," Pervasive and Mobile Computing, May 2014.
[28]
M. Aharon, M. Elad, and A. Bruckstein, "K-svd: An algorithm for designing overcomplete dictionaries for sparse representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311--4322, 2006.
[29]
R. Rigamonti, A. Sironi, V. Lepetit, and P. Fua, "Learning separable filters," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 2754--2761.
[30]
M. Jaderberg, A. Vedaldi, and A. Zisserman, "Speeding up convolutional neural networks with low rank expansions," arXiv preprint arXiv:1405.3866, 2014.
[31]
C. Tai, T. Xiao, Y. Zhang, X. Wang, and W. E, "Convolutional neural networks with low-rank regularization," arXiv preprint arXiv:1511.06067, 2015.
[32]
A. Rakotomamonjy and G. Gasso, "Histogram of gradients of time-frequency representations for audio scene detection," Technical report, HAL, https://sites.google.com/site/alainrakotomamonjy/home/audio-scene, 2014.
[33]
T. E. N. Y. J. Wu, Zhizheng; Kinnunen, "Automatic speaker verification spoofing and countermeasures challenge (asvspoof 2015) database," University of Edinburgh. The Centre for Speech Technology Research (CSTR), Tech. Rep., 2015.
[34]
R. J. W. David E Rumelhart, Geoffrey E Hinton, "Learning representations by back-propagating errors," Nature, vol. 323, pp. 533--536, 1986.
[35]
"Qualcomm Snapdragon 400," https://www.qualcomm.com/products/snapdragon/processors/400.
[36]
"LG G Watch R," https://www.qualcomm.com/products/snapdragon/wearables/lg-g-watch-r.
[37]
"Google Project Ara," http://www.projectara.com.
[38]
"Audi self-driving car brings NVIDIA Tegra K1 front and center," http://www.slashgear.com/audi-self-driving-car-brings-nvidia-tegra-k1-front-and-center-25322090/.
[39]
"June Oven," http://techgage.com/news/nvidias-tegra-k1-soc-has-made-it-into-an-oven-that-detects-what-its-cooking/.
[40]
"Nvida CUDA," http://developer.nvidia.com/cuda-zone.
[41]
"ARM MBED Cortex M0," https://developer.mbed.org/platforms/mbed-LPC11U24/.
[42]
"ARM MBED Cortex M3," https://developer.mbed.org/platforms/mbed-LPC1768/.
[43]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2015, pp. 161--170.
[44]
S. Kang, J. Lee, H. Jang, H. Lee, Y. Lee, S. Park, T. Park, and J. Song, "Seemon: Scalable and energy-efficient context monitoring framework for sensor-rich mobile environments," in Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services, ser. MobiSys '08. New York, NY, USA: ACM, 2008, pp. 267--280. {Online}. Available: http://doi.acm.org/10.1145/1378600.1378630
[45]
S. Nath, "Ace: Exploiting correlation for energy-efficient and continuous context sensing," in Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, ser. MobiSys '12. New York, NY, USA: ACM, 2012, pp. 29--42. {Online}. Available: http://doi.acm.org/10.1145/2307636.2307640
[46]
M.-R. Ra, A. Sheth, L. B. Mummert, P. Pillai, D. Wetherall, and R. Govindan, "Odessa: enabling interactive perception applications on mobile devices." in MobiSys, A. K. Agrawala, M. D. Corner, and D. Wetherall, Eds. ACM, 2011, pp. 43--56. {Online}. Available: http://dblp.uni-trier.de/db/conf/mobisys/mobisys2011.html#RaSMPWG11
[47]
M.-M. Moazzami, D. E. Phillips, R. Tan, and G. Xing, "Orbit: A smartphone-based platform for data-intensive embedded sensing applications," in Proceedings of the 14th International Conference on Information Processing in Sensor Networks, ser. IPSN '15. New York, NY, USA: ACM, 2015, pp. 83--94. {Online}. Available: http://doi.acm.org/10.1145/2737095.2737098
[48]
Y. Ju, Y. Lee, J. Yu, C. Min, I. Shin, and J. Song, "Symphoney: A coordinated sensing flow execution engine for concurrent mobile sensing applications," in Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems, ser. SenSys '12. New York, NY, USA: ACM, 2012, pp. 211--224. {Online}. Available: http://doi.acm.org/10.1145/2426656.2426678
[49]
D. Chu, N. D. Lane, T. T.-T. Lai, C. Pang, X. Meng, Q. Guo, F. Li, and F. Zhao, "Balancing energy, latency and accuracy for mobile sensor data classification," in Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems, ser. SenSys '11. New York, NY, USA: ACM, 2011, pp. 54--67. {Online}. Available: http://doi.acm.org/10.1145/2070942.2070949
[50]
E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, "Maui: Making smartphones last longer with code offload," in Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services, ser. MobiSys '10. New York, NY, USA: ACM, 2010, pp. 49--62. {Online}. Available: http://doi.acm.org/10.1145/1814433.1814441
[51]
E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez-Dominguez, "Deep neural networks for small footprint text-dependent speaker verification," in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, 2014, pp. 4052--4056. {Online}. Available: http://dx.doi.org/10.1109/ICASSP.2014.6854363
[52]
N. D. Lane, P. Georgiev, and L. Qendro, "Deepear: Robust smartphone audio sensing in unconstrained acoustic environments using deep learning," in Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, ser. UbiComp '15. New York, NY, USA: ACM, 2015, pp. 283--294. {Online}. Available: http://doi.acm.org/10.1145/2750858.2804262
[53]
Y. Gong, L. Liu, M. Yang, and L. Bourdev, "Compressing deep convolutional networks using vector quantization," arXiv preprint arXiv:1412.6115, 2014.
[54]
J. Xue, J. Li, and Y. Gong, "Restructuring of deep neural network acoustic models with singular value decomposition," in Interspeech, 2013. {Online}. Available: http://research.microsoft.com/apps/pubs/default.aspx?id=201364

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SenSys '16: Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM
November 2016
398 pages
ISBN:9781450342636
DOI:10.1145/2994551
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Wearable computing
  2. deep learning
  3. sparse coding
  4. weight factorization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

Acceptance Rates

Overall Acceptance Rate 174 of 867 submissions, 20%

Upcoming Conference

SenSys '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)112
  • Downloads (Last 6 weeks)12
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media