skip to main content
research-article

DANCE: DAta-Network Co-optimization for Efficient Segmentation Model Training and Inference

Published: 27 September 2022 Publication History

Abstract

Semantic segmentation for scene understanding is nowadays widely demanded, raising significant challenges for the algorithm efficiency, especially its applications on resource-limited platforms. Current segmentation models are trained and evaluated on massive high-resolution scene images (“data-level”) and suffer from the expensive computation arising from the required multi-scale aggregation (“network level”). In both folds, the computational and energy costs in training and inference are notable due to the often desired large input resolutions and heavy computational burden of segmentation models. To this end, we propose DANCE, general automated DAta-Network Co-optimization for Efficient segmentation model training and inference. Distinct from existing efficient segmentation approaches that focus merely on light-weight network design, DANCE distinguishes itself as an automated simultaneous data-network co-optimization via both input data manipulation and network architecture slimming. Specifically, DANCE integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images’ spatial complexity. Such a downsampling operation, in addition to slimming down the cost associated with the input size directly, also shrinks the dynamic range of input object and context scales, therefore motivating us to also adaptively slim the network to match the downsampled data. Extensive experiments and ablating studies (on four SOTA segmentation models with three popular segmentation datasets under two training settings) demonstrate that DANCE can achieve “all-win” towards efficient segmentation (reduced training cost, less expensive inference, and better mean Intersection-over-Union (mIoU)). Specifically, DANCE can reduce ↓25%–↓77% energy consumption in training, ↓31%–↓56% in inference, while boosting the mIoU by ↓0.71%–↑ 13.34%.

References

[1]
Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, and Roberto Cipolla. 2008. Segmentation and recognition using structure from motion point clouds. In Proceedings of the European Conference on Computer Vision. Springer, 44–57.
[2]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
[3]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision. 801–818.
[4]
Wuyang Chen, Xinyu Gong, Xianming Liu, Qian Zhang, Yuan Li, and Zhangyang Wang. 2019. FasterSeg: Searching for faster real-time semantic segmentation. In International Conference on Learning Representations.
[5]
Wuyang Chen, Ziyu Jiang, Zhangyang Wang, Kexin Cui, and Xiaoning Qian. 2019. Collaborative global-local networks for memory-efficient segmentation of ultra-high resolution images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8924–8933.
[6]
Xinghao Chen, Yunhe Wang, Yiman Zhang, Peng Du, Chunjing Xu, and Chang Xu. 2020. Multi-task pruning for semantic segmentation networks. In 2022 IEEE International Conference on Multimedia and Expo (ICME). 1–6.
[7]
Ting-Wu Chin, Ruizhou Ding, and Diana Marculescu. 2019. Adascale: Towards real-time video object detection using adaptive scaling. Proceedings of Machine Learning and Systems 1 (2019), 431–441.
[8]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3213–3223.
[9]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[10]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 764–773.
[11]
Yadolah Dodge and Daniel Commenges. 2006. The Oxford Dictionary of Statistical Terms. Oxford University Press on Demand.
[12]
Andries P. Engelbrecht, L. Fletcher, and Ian Cloete. 1999. Variance analysis of sensitivity information for pruning multilayer feedforward neural networks. In Proceedings of the International Joint Conference on Neural Networks. Proceedings. IEEE, 1829–1833.
[13]
Li et.al. 2017. Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017). DOI:
[14]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.
[15]
Alex Gain and Hava Siegelmann. 2019. Relating information complexity and training in deep neural networks. In Proceedings of the Micro-and Nanotechnology Sensors, Systems, and Applications XI. International Society for Optics and Photonics, 109822H.
[16]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR).
[17]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Advances in Neural Information Processing Systems. 1135–1143.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[19]
Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, and Youliang Yan. 2019. Knowledge adaptation for efficient semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 578–587.
[20]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 1389–1397.
[21]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869–6898.
[22]
Angela H. Jiang, Daniel L. K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, and Padmanabhan Pillai. 2019. Accelerating deep learning by focusing on the biggest losers.
[23]
Pouria Khanzadi, Babak Majidi, and Ehsan Akhtarkavan. 2017. A novel metric for digital image quality assessment using entropy-based image complexity. In Proceedings of the 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation. IEEE, 0440–0445.
[24]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.
[25]
Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision. 2736–2744.
[26]
Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision. 5058–5066.
[27]
Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, and Mattan Erez. 2019. PruneTrain: Fast neural network training by dynamic sparse model reconfiguration. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–13.
[28]
Dmitrii Marin, Zijian He, Peter Vajda, Priyam Chatterjee, Sam Tsai, Fei Yang, and Yuri Boykov. 2019. Efficient segmentation: Learning downsampling near semantic boundaries. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2131–2141.
[29]
James Clerk Maxwell. 1860. V. Illustrations of the dynamical theory of gases.-Part I. On the motions and collisions of perfectly elastic spheres. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 19, 124 (1860), 19–32.
[30]
Suraj Mishra, Peixian Liang, Adam Czajka, Danny Z. Chen, and X. Sharon Hu. 2019. CC-NET: Image complexity guided network compression for biomedical image segmentation. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging. IEEE, 57–60.
[31]
[32]
Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. 2016. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147. Retrieved from https://arxiv.org/abs/1606.02147.
[33]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Proceedings of the NIPS-W.
[34]
Antonio Polino, Razvan Pascanu, and Dan Alistarh. 2018. Model compression via distillation and quantization. In International Conference on Learning Representations.
[35]
Alex Renda, Jonathan Frankle, and Michael Carbin. 2020. Comparing rewinding and fine-tuning in neural network pruning. In Proceedings of the International Conference on Learning Representations.
[36]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.
[37]
Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and Policy Considerations for Modern Deep Learning Research. Proceedings of the AAAI Conference on Artificial Intelligence 34 (Apr. 2020), 13693–13696.
[38]
Gregory K. Wallace. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv.
[39]
Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, and Zhangyang Wang. 2019. E2-train: Training state-of-the-art CNNs with over 80% less energy. In Proceedings of the Advances in Neural Information Processing Systems.
[40]
Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, and Larry S. Davis. 2019. Adaframe: Adaptive frame selection for fast video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1278–1287.
[41]
Ran Xu, Rakesh Kumar, Pengcheng Wang, Peter Bai, Ganga Meghanath, Somali Chaterji, Subrata Mitra, and Saurabh Bagchi. 2021. ApproxNet: Content and Contention-Aware Video Object Classification System for Embedded Clients. ACM Trans. Sen. Netw. 18, 1 (2021).
[42]
Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. 2017. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5687–5695.
[43]
Shaokai Ye, Tianyun Zhang, Kaiqi Zhang, Jiayu Li, Kaidi Xu, Yunfei Yang, Fuxun Yu, Jian Tang, Makan Fardad, Sijia Liu, et al. 2018. Progressive weight pruning of deep neural networks using ADMM. arXiv:1810.07378. Retrieved from https://arxiv.org/abs/1810.07378.
[44]
Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan Chen, Yingyan Lin, Zhangyang Wang, and Richard G. Baraniuk. 2019. Drawing early-bird tickets: Towards more efficient training of deep networks. In International Conference on Learning Representations.
[45]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision. 325–341.
[46]
Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions. In International Conference on Learning Representations (ICLR).
[47]
Fisher Yu, Wenqi Xian, Yingying Chen, Fangchen Liu, Mike Liao, Vashisht Madhavan, and Trevor Darrell. 2018. Bdd100k: A diverse driving video database with scalable annotation tooling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20).
[48]
H. Yu and S. Winkler. 2013. Image complexity and spatial information. In Proceedings of the 2013 5th International Workshop on Quality of Multimedia Experience. 12–17. DOI:
[49]
Hong Zhang, Jiongwei Dong, Shasha He, and Shengze Lv. 2020. Research on image complexity description method based on approximate entropy. In Proceedings of the 2020 International Conference on Intelligent Transportation, Big Data & Smart City. IEEE, 918–920.
[50]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856.
[51]
Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. 2018. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision. 405–420.
[52]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2881–2890.
[53]
Elizabeth Y. Zhou, Claudia Damiano, John Wilder, and Dirk B. Walther. 2019. Measuring complexity of images using multiscale entropy. Journal of Vision 19, 10 (2019), 96a–96a.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 27, Issue 5
September 2022
274 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3540253
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 27 September 2022
Online AM: 05 May 2022
Accepted: 07 January 2022
Revised: 23 December 2021
Received: 15 July 2021
Published in TODAES Volume 27, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Efficient training and inference methods
  2. semantic segmentation

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • NSF RTML program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 203
    Total Downloads
  • Downloads (Last 12 months)63
  • Downloads (Last 6 weeks)7
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media