research-article

Public Access

HERO: hessian-enhanced robust optimization for unifying and improving generalization and quantization performance

Authors:

Neil Zhenqiang Gong,

Yiran ChenAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 25 - 30

https://doi.org/10.1145/3489517.3530678

Published: 23 August 2022 Publication History

Abstract

With the recent demand of deploying neural network models on mobile and edge devices, it is desired to improve the model's generalizability on unseen testing data, as well as enhance the model's robustness under fixed-point quantization for efficient deployment. Minimizing the training loss, however, provides few guarantees on the generalization and quantization performance. In this work, we fulfill the need of improving generalization and quantization performance simultaneously by theoretically unifying them under the framework of improving the model's robustness against bounded weight perturbation and minimizing the eigenvalues of the Hessian matrix with respect to model weights. We therefore propose HERO, a Hessian-enhanced robust optimization method, to minimize the Hessian eigenvalues through a gradient-based training process, simultaneously improving the generalization and quantization performance. HERO enables up to a 3.8% gain on test accuracy, up to 30% higher accuracy under 80% training label perturbation, and the best post-training quantization accuracy across a wide range of precision, including a > 10% accuracy improvement over SGD-trained models for common model architectures on various datasets.

References

[1]

Milad Alizadeh et al. 2020. Gradient l1 regularization for quantization robustness. arXiv preprint arXiv:2002.07520 (2020).

[2]

Ron Banner et al. 2018. Post-training 4-bit quantization of convolution networks for rapid-deployment. arXiv preprint arXiv:1810.05723 (2018).

[3]

Yoshua Bengio et al. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).

[4]

Ekin D Cubuk et al. 2018. AutoAugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018).

[5]

Jia Deng et al. 2009. ImageNet: A large-scale hierarchical image database. In ICCV.

[6]

Alhussein Fawzi et al. 2018. Empirical study of the topology and geometry of deep networks. In ICCV.

[7]

Pierre Foret et al. 2020. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412 (2020).

[8]

Ian J Goodfellow et al. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).

[9]

Kaiming He et al. 2016. Deep residual learning for image recognition. In ICCV.

[10]

Mark Horowitz. 2014. 1.1 computing's energy problem (and what we can do about it). In ISSCC.

[11]

Gao Huang et al. 2016. Deep networks with stochastic depth. In ECCV.

[12]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).

Digital Library

[13]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report.

[14]

Anders Krogh and John A Hertz. 1991. A simple weight decay can improve generalization. In NeurIPS.

[15]

Hao Li et al. 2017. Visualizing the loss landscape of neural nets. arXiv preprint arXiv:1712.09913 (2017).

[16]

Junnan Li et al. 2020. DivideMix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394 (2020).

[17]

Aleksander Madry et al. 2018. Towards deep learning models resistant to adversarial attacks. In ICLR.

[18]

Seyed-Mohsen Moosavi-Dezfooli et al. 2019. Robustness via curvature regularization, and vice versa. In ICCV.

[19]

Antonio Polino et al. 2018. Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668 (2018).

[20]

Mark Sandler et al. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In ICCV.

[21]

Karen Simonyan and Andrew Zisserman 2014 Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[22]

Nitish Srivastava et al. 2014. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res (2014), 1929--1958.

[23]

Huanrui Yang et al. 2021. BSQ: Exploring bit-level sparsity for mixed-Precision neural network quantization. arXiv preprint arXiv:2102.10462 (2021).

[24]

Chiyuan Zhang et al. 2016. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016).

[25]

Hongyi Zhang et al. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).

[26]

Ritchie Zhao et al. 2019. Improving neural network quantization without retraining using outlier channel splitting. In ICML. 7543--7552.

[27]

Shuchang Zhou et al. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).

Cited By

Yang XWang ZHu XKim CYu SPajic MManohar RChen YLi H(2024)Neuro-Symbolic Computing: Advancements and Challenges in Hardware–Software Co-DesignIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.333625171:3(1683-1689)Online publication date: Mar-2024
https://doi.org/10.1109/TCSII.2023.3336251
Zhou YPang TLiu KMartin CMahoney MYang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Temperature balancing, layer-wise weight analysis, and neural network trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668897(63542-63572)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668897
Kim BDu ZSun JChen Y(2023)Invited Paper: Towards the Efficiency, Heterogeneity, and Robustness of Edge AI2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323922(1-7)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323922
Show More Cited By

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification

Semi-supervised framework which exploits unsupervised approach (JST) is proposed.Self-training suffers from incorrectly labeling problem with insufficient data.Confidently predicted instances are labeled and used as training data by JST.Self-training ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
764
Total Downloads

Downloads (Last 12 months)444
Downloads (Last 6 weeks)53

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang XWang ZHu XKim CYu SPajic MManohar RChen YLi H(2024)Neuro-Symbolic Computing: Advancements and Challenges in Hardware–Software Co-DesignIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.333625171:3(1683-1689)Online publication date: Mar-2024
https://doi.org/10.1109/TCSII.2023.3336251
Zhou YPang TLiu KMartin CMahoney MYang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Temperature balancing, layer-wise weight analysis, and neural network trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668897(63542-63572)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668897
Kim BDu ZSun JChen Y(2023)Invited Paper: Towards the Efficiency, Heterogeneity, and Robustness of Edge AI2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323922(1-7)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323922
Yang XYang HZhang JLi HChen Y(2022)On Building Efficient and Robust Neural Network Designs2022 56th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF56349.2022.10051891(317-321)Online publication date: 31-Oct-2022
https://doi.org/10.1109/IEEECONF56349.2022.10051891

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents