skip to main content
10.1145/3461648.3463846acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
Article
Open access

CHaNAS: coordinated search for network architecture and scheduling policy

Published: 22 June 2021 Publication History

Abstract

Automatically design an efficient DNN solution for a given deep learning task on the target hardware mainly decided by the neural network architecture and the schedule mapping strategy, where the two goals are closely coupled with each other to fully exploit the advantages of the underlying hardware. Prior hardware-aware Neural Architecture Search (NAS) methods mostly ignore the impacts of different scheduling policies (e.g., graph-level optimization, loop transformations, parallelization, etc.) on network candidates being evaluated in the search process. Thus, they may miss the true-optimal architecture that can only be discovered by trying-out different scheduling policies. This work proposes a NAS framework (CHaNAS) that searches for not only the network architecture but also the dedicated scheduling policy, as the optimal co-design solution on target hardware that fully exploits the advantages of the underlying hardware. We propose to use a block-based pre-scheduling methodology to reduce the co-design search space, and enable the automatic generation of the optimal co-design, including the network architecture and the tensor programs that practice the scheduling policy. We evaluate CHaNAS on Imagenet on different hardware back-ends against the state-of-the-art hardware-aware search method MobileNet-v3. Experimental results show that the co-design solutions obtained by ChaNAS show up to 1.6x, 1.9x, and 1.7x performance boost on NVIDIA P100 GPU, Intel Xeon 8163 CPU, and Samsung Note 10 Mobile, respectively, over the baselines of the same-level accuracy.

References

[1]
2020. Google(R), XLA. https://www.tensorflow.org/xla
[2]
2020. Inter(R) MKL-DNN. https://github.com/intel/mkl-dnn
[3]
2020. NVIDIA(R), CUBLAS Library. https://www.nvidia.com/
[4]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2017. Tensorflow: A system for large-scale machine learning. In 12th $USENIX$ Symposium on Operating Systems Design and Implementation ($OSDI$ 17). 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
[5]
Mohamed S Abdelfattah, Ł ukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, and Nicholas D Lane. 2020. Best of both worlds: Automl codesign of a cnn and its hardware accelerator. In 57th ACM/IEEE Design Automation Conference (DAC), 1–6. https://doi.org/10.1109/DAC18072.2020.9218596
[6]
Aayush Ankit, Abhronil Sengupta, and Kaushik Roy. 2018. Neuromorphic computing across the stack: Devices, circuits and architectures. In 2018 IEEE International Workshop on Signal Processing Systems (SiPS). 1–6. https://doi.org/10.1109/SiPS.2018.8598419
[7]
Marco Bacis, Giuseppe Natale, Emanuele Del Sozzo, and Marco Domenico Santambrogio. 2017. A pipelined and scalable dataflow implementation of convolutional neural networks on FPGA. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 90–97. https://doi.org/10.1109/IPDPSW.2017.44
[8]
Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. 2018. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning (ICML), 550–559. http://proceedings.mlr.press/v80/bender18a.html
[9]
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations (ICLR), https://openreview.net/forum?id=HylxE1HKwS
[10]
Han Cai, Ligeng Zhu, and Song Han. 2019. Proxylessnas: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations (ICLR), https://openreview.net/forum?id=HylVB3AqYm
[11]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274, arxiv:1512.01274
[12]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, and Luis Ceze. 2018. $TVM$: An automated end-to-end optimizing compiler for deep learning. In 13th $USENIX$ Symposium on Operating Systems Design and Implementation ($OSDI$ 18). 578–594. https://www.usenix.org/conference/osdi18/presentation/chen
[13]
Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to Optimize Tensor Programs. In Conference on Neural Information Processing Systems (NeurIPS), 31 (2018), 3389–3400. https://proceedings.neurips.cc/paper/2018/hash/8b5700012be65c9da25f49408d959ca0-Abstract.html
[14]
Weiwei Chen, Ying Wang, Shuang Yang, Chen Liu, and Lei Zhang. 2020. You Only Search Once: A Fast Automation Framework for Single-Stage DNN/Accelerator Co-design. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 1283–1286. https://doi.org/10.23919/DATE48585.2020.9116474
[15]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2017. Using dataflow to optimize energy efficiency of deep neural network accelerators. In Proceedings of the 50nd Annual IEEE/ACM International Symposium on Microarchitecture (Micro), 37, 3 (2017), 12–21. https://doi.org/10.1109/MM.2017.54
[16]
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. In IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 292–308. https://doi.org/10.1109/JETCAS.2019.2910232
[17]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. In CoRR, abs/1410.0759 (2014), arxiv:1410.0759
[18]
Scott Cyphers, Arjun K Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, and Omar Kanawi. 2018. Intel ngraph: An intermediate representation, compiler, and executor for deep learning. arXiv:1801.08058, arxiv:1801.08058
[19]
Anup Das, Akash Kumar, and Bharadwaj Veeravalli. 2014. Energy-aware task mapping and scheduling for reliable embedded computing systems. In ACM Transactions on Embedded Computing Systems (TECS), 13, 2s (2014), 1–27. https://doi.org/10.1145/2544375.2544392
[20]
Anup Das, Akash Kumar, and Bharadwaj Veeravalli. 2015. Reliability and energy-aware mapping and scheduling of multimedia applications on multiprocessor systems. In IEEE Transactions on Parallel and Distributed Systems, 27, 3 (2015), 869–884. https://doi.org/10.1109/TPDS.2015.2412137
[21]
Suyog Gupta and Berkin Akin. 2020. Accelerator-aware Neural Network Design using AutoML. arXiv: 2003.02838, arxiv:2003.02838
[22]
Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-mei Hwu, and Deming Chen. 2019. FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge. In 56th ACM/IEEE Design Automation Conference (DAC), https://doi.org/10.1145/3316781.3317829
[23]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, and Vijay Vasudevan. 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
[24]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, arxiv:1704.04861
[25]
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv:1602.07360, arxiv:1602.07360
[26]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. 675–678. https://doi.org/10.1145/2647868.2654889
[27]
Weiwen Jiang, Xinyi Zhang, Edwin H-M Sha, Lei Yang, Qingfeng Zhuge, Yiyu Shi, and Jingtong Hu. 2019. Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search. In 56th ACM/IEEE Design Automation Conference (DAC), 1–6. https://doi.org/10.1145/3316781.3317757
[28]
Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (Micro). 754–768. https://doi.org/10.1145/3352460.3358252
[29]
Kiseok Kwon, Alon Amid, Amir Gholami, Bichen Wu, Krste Asanovic, and Kurt Keutzer. 2018. Co-design of deep neural nets and neural net accelerators for embedded vision applications. In 55th ACM/IEEE Design Automation Conference (DAC), 1–6. https://doi.org/10.1147/JRD.2019.2942284
[30]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. Darts: Differentiable architecture search. In International Conference on Learning Representations (ICLR), https://openreview.net/forum?id=S1eYHoC5FX
[31]
Qing Lu, Weiwen Jiang, Xiaowei Xu, Yiyu Shi, and Jingtong Hu. 2019. On neural architecture search for resource-constrained hardware platforms. arXiv:1911.00105, arxiv:1911.00105
[32]
Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). 45–54. http://dl.acm.org/citation.cfm?id=3021736
[33]
Diana Marculescu, Dimitrios Stamoulis, and Ermao Cai. 2018. Hardware-Aware Machine Learning: Modeling and Optimization. In Proceedings of the International Conference on Computer-Aided Design (ICCAD), https://doi.org/10.1145/3240765.3243479
[34]
Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. Polymage: Automatic optimization for image processing pipelines. In ACM SIGARCH Computer Architecture News, 43, 1 (2015), 429–443. https://doi.org/10.1145/2694344.2694364
[35]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. Pytorch: An imperative style, high-performance deep learning library. arXiv:1912.01703, https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
[36]
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10428–10436. https://doi.org/10.1109/CVPR42600.2020.01044
[37]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Acm Sigplan Notices, 48, 6 (2013), 519–530. https://doi.org/10.1145/2491956.2462176
[38]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2820–2828. https://doi.org/10.1109/CVPR.2019.00293
[39]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv:1802.04730, arxiv:1802.04730
[40]
Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, and Leonard Truong. 2019. SWIRL: High-performance many-core CPU code generation for deep neural networks. In International Journal of High Performance Computing Applications, 33, 6 (2019), 1275–1289. https://doi.org/10.1177/1094342019866247
[41]
Ying Wang, Shengwen Liang, Huawei Li, and Xiaowei Li. 2019. A None-Sparse Inference Accelerator That Distills and Reuses the Computation Redundancy in CNNs. In Proceedings of the 56th Annual Design Automation Conference 2019 (DAC ’19). Association for Computing Machinery, New York, NY, USA. isbn:9781450367257
[42]
Ying Wang, Jie Xu, Yinhe Han, Huawei Li, and Xiaowei Li. 2016. DeepBurning: Automatic generation of FPGA-based learning accelerators for the neural network family. In 53th ACM/IEEE Design Automation Conference (DAC). 1–6. https://doi.org/10.1145/2897937.2898003
[43]
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10734–10742. https://doi.org/10.1109/CVPR.2019.01099
[44]
Qingcheng Xiao, Yun Liang, Liqiang Lu, Shengen Yan, and Yu-Wing Tai. 2017. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017. 1–6. https://doi.org/10.1145/3061639.3062244
[45]
Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Bowen Shi, Qi Tian, and Hongkai Xiong. 2020. Latency-Aware Differentiable Neural Architecture Search. arXiv: 2001.06392, arxiv:2001.06392
[46]
Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, and John Wawrzynek. 2019. Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), https://doi.org/10.1145/3289602.3293902
[47]
Wang Ying, Xu Jie, Han Yinhe, Li Huawei, and Li Xiaowei. 2016. DeepBurning: Automatic Generation of FPGA-based Learning Accelerators for the Neural Network Family. In 53th ACM/IEEE Design Automation Conference (DAC), https://doi.org/10.1145/2897937.2898003
[48]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6848–6856. https://doi.org/10.1109/CVPR.2018.00716
[49]
Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, and Koushik Sen. 2020. Ansor: Generating high-performance tensor programs for deep learning. In 14th $USENIX$ Symposium on Operating Systems Design and Implementation ($OSDI$ 19). 863–879. https://www.usenix.org/conference/osdi20/presentation/zheng
[50]
Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020. Flextensor: An automatic schedule exploration and optimization framework for tensor computation on heterogeneous system. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 859–873. https://doi.org/10.1145/3373376.3378508
[51]
Barret Zoph and Quoc V Le. 2017. Neural architecture search with reinforcement learning. In International Conference on Learning Representations (ICLR), https://openreview.net/forum?id=r1Ue8Hcxg

Index Terms

  1. CHaNAS: coordinated search for network architecture and scheduling policy

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      LCTES 2021: Proceedings of the 22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems
      June 2021
      162 pages
      ISBN:9781450384728
      DOI:10.1145/3461648
      • General Chair:
      • Jörg Henkel,
      • Program Chair:
      • Xu Liu
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 June 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Compiler optimization
      2. Hardware-aware Neural Architecture Search
      3. NN-Scheduling Co-design

      Qualifiers

      • Article

      Funding Sources

      Conference

      LCTES '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 116 of 438 submissions, 26%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 468
        Total Downloads
      • Downloads (Last 12 months)162
      • Downloads (Last 6 weeks)29
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media