skip to main content
10.1145/3243176.3243180acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

A portable, automatic data qantizer for deep neural networks

Published: 01 November 2018 Publication History

Abstract

With the proliferation of AI-based applications and services, there are strong demands for efficient processing of deep neural networks (DNNs). DNNs are known to be both compute-and memory-intensive as they require a tremendous amount of computation and large memory space. Quantization is a popular technique to boost efficiency of DNNs by representing a number with fewer bits, hence reducing both computational strength and memory footprint. However, it is a difficult task to find an optimal number representation for a DNN due to a combinatorial explosion in feasible number representations with varying bit widths, which is only exacerbated by layer-wise optimization. Besides, existing quantization techniques often target a specific DNN framework and/or hardware platform, lacking portability across various execution environments. To address this, we propose libnumber, a portable, automatic quantization framework for DNNs. By introducing Number abstract data type (ADT), libnumber encapsulates the internal representation of a number from the user. Then the auto-tuner of libnumber finds a compact representation (type, bit width, and bias) for the number that minimizes the user-supplied objective function, while satisfying the accuracy constraint. Thus, libnumber effectively separates the concern of developing an effective DNN model from low-level optimization of number representation. Our evaluation using eleven DNN models on two DNN frameworks targeting an FPGA platform demonstrates over 8× (7×) reduction in the parameter size on average when up to 7% (1%) loss of relative accuracy is tolerable, with a maximum reduction of 16×, compared to the baseline using 32-bit floating-point numbers. This leads to an geomean speedup of 3.79× with a maximum speedup of 12.77× over the baseline, while requiring only minimal programmer effort.

References

[1]
Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O'Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic deep neural network computing. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '17).
[2]
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An extensible framework for program autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14).
[3]
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08).
[4]
Doug Burger. 2017. Microsoft unveils Project Brainwave for real-time AI. https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/. 2017.
[5]
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. 2014. DaDianNao: A machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '14).
[6]
E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, M. Abeydeera, L. Adams, H. Angepat, C. Boehn, D. Chiou, O. Firestein, A. Forin, K. S. Gatlin, M. Ghandi, S. Heil, K. Holohan, A. El Husseini, T. Juhasz, K. Kagi, R. Kovvuri, S. Lanka, F. van Megen, D. Mukhortov, P. Patel, B. Perez, A. Rapsang, S. Reinhardt, B. Rouhani, A. Sapek, R. Seera, S. Shekar, B. Sridharan, G. Weisz, L. Woods, P. Yi Xiao, D. Zhang, R. Zhao, and D. Burger. 2018. Serving DNNs in real time at datacenter scale with Project Brainwave. IEEE Micro 2018.
[7]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. 2014. arXiv:1412.7024.
[8]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks: Training deep neural networks with weights and activations constrained to +1 or -1. 2016. arXiv:1602.02830.
[9]
C. Cummins, P. Petoumenos, Z. Wang, and H. Leather. 2017. End-to-end deep learning of optimization heuristics. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT '17).
[10]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR '09).
[11]
Cody A. Coleman et al. 2017. DAWNBench: An end-to-end deep learning benchmark and competition. https://github.com/stanford-futuredata/dawn-bench-entries. 2017.
[12]
G. Hinton et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 2012.
[13]
Norman P. Jouppi et al. 2017. In-datacenter performance analysis of a Tensor Processing Unit. 2017. arXiv:1704.04760.
[14]
Paulius Micikevicius et al. 2017. Mixed precision training. 2017. arXiv:1710.03740.
[15]
Szegedy et al. 2015. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR '15).
[16]
Volodymyr Mnih et. al. 2015. Human-level control through deep reinforcement learning. Nature 2015.
[17]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision. 2010.
[18]
Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian Caulfield, Eric S. Chung, and Doug Burger. 2018. A configurable cloud-Scale DNN processor for real-time AI. In Proceedings of the 45th International Symposium on Computer Architecture (ISCA '18).
[19]
Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. 2016. arXiv:1604.03168.
[20]
Philipp Gysel, Jon Pimentel, Mohammad Motamedi, and Soheil Ghiasi. 2018. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems. 2018.
[21]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. 2015. arXiv:1510.00149.
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR '16).
[23]
Xilinx INC. 2018. Xilinx Kintex UltraSCALE FPGA Family https://www.xilinx.com/products/silicon-devices/fpga/kintex-ultrascale.html. 2018.
[24]
Yangqing et al. Jia. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (ACMMM '14).
[25]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, and Andreas Moshovos. 2015. Reduced-precision strategies for bounded memory in deep neural nets. 2015. arXiv:1511.05236.
[26]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M. Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Proteus: Exploiting numerical precision variability in deep neural networks. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16).
[27]
P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '16).
[28]
Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. 2015. arXiv:1511.06530.
[29]
Alex Krizhevsky. 2012. cuda-convnet: High-performance c++/cuda implementation of convolutional neural networks. https://code.google.com/p/cuda-convnet/. 2012.
[30]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. 2009.
[31]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems.
[32]
Yann LeCun. 1998. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. 1998.
[33]
Yann LeCun et al. 1998. LeNet-5, convolutional neural networks. 1998.
[34]
Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. 2016. Fixed point quantization of deep convolutional networks. In Proceedings of the 33rd International Conference on Machine Learning (ICML '16).
[35]
Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. 2013. arXiv:1312.4400.
[36]
Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. 2015. Neural networks with few multiplications. 2015. arXiv:1510.03009.
[37]
T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, and S. Khudanpur. 2011. Extensions of recurrent neural network language model. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '11).
[38]
Daisuke Miyashita, Edward H. Lee, and Boris Murmann. 2016. Convolutional neural networks using logarithmic data representation. 2016. arXiv:1603.01025.
[39]
Cesc Chunseong Park, Byeongchang Kim, and Gunhee Kim. 2017. Attend to you: Personalized image captioning with context sequence memory networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR '17).
[40]
Jiantao et al. Qiu. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16).
[41]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13).
[42]
Arun Raman, Hanjun Kim, Taewook Oh, Jae W. Lee, and David I. August. 2011. Parallelism orchestration using DoPE: the degree of parallelism executive. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11).
[43]
Arun Raman, Ayal Zaks, Jae W. Lee, and David I. August. 2012. Parcae: a system for flexible parallel execution. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12).
[44]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. 2016. arXiv:1603.05279.
[45]
Joseph Redmon. 2013-2016. Darknet: Open source neural networks in c. 2013-2016. http://pjreddie.com/darknet/
[46]
Joseph Redmon and Ali Farhadi. 2016. YOLO9000: better, faster, stronger. 2016. arXiv:1612.08242.
[47]
Mengye Ren, Ryan Kiros, and Richard Zemel. 2015. Exploring models and data for image question answering. In Advances in neural information processing systems.
[48]
Hochreiter S. and Schmidhuberm J. 1997. Long short-term memory. Neural Computation 1997.
[49]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR '15).
[50]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. 2014. arXiv:1409.1556.
[51]
Sainbayar Sukhbaatar, arthur szlam, Jason Weston, and Rob Fergus. 2015. End-to-end memory networks. In Advances in neural information processing systems.
[52]
Hokchhay Tann, Soheil Hashemi, R Iris Bahar, and Sherief Reda. 2017. Hardware-software codesign of accurate, multiplier-free deep neural networks. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE. IEEE, 1--6.
[53]
TensorFlow™. 2017. TensorFlow mechanics 101. https://github.com/tensorflow/tensorflow/tree/r1.2/tensorflow/examples/tutorials/mnist. 2017.
[54]
TensorFlow™. 2018. TensorFlow: How to quantize neural networks with TensorFlow. https://www.tensorflow.org/performance/quantization. 2018.
[55]
Rakesh Vasudevan. 2017. CIFAR-10 classifier. https://github.com/vrakesh/CIFAR-10-Classifier. 2017.
[56]
Zheng Wang, Dominik Grewe, and Michael F. P. O'boyle. 2014. Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems. ACM Transaction on Architecture and Code Optimization. 2014.
[57]
Ouyang Wanli and Xiaogang Wang. 2013. Joint deep learning for pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV '13).
[58]
Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In Proceedings of the International Conference on Learning Representations (ICLR '15).
[59]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '15).
[60]
Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17).
[61]
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. 2017. arXiv:1702.03044.
[62]
Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained ternary quantization. 2016. arXiv:1612.01064.

Cited By

View all
  1. A portable, automatic data qantizer for deep neural networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques
    November 2018
    494 pages
    ISBN:9781450359863
    DOI:10.1145/3243176
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • IFIP WG 10.3: IFIP WG 10.3
    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximate computing
    2. auto-tuning
    3. deep neural networks
    4. optimzation
    5. performance
    6. quantization

    Qualifiers

    • Research-article

    Funding Sources

    • the National Research Foundation of Korea
    • Samsung Advanced Institute of Technology

    Conference

    PACT '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media