Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference Approach

Tang, Chen; Zhai, Haoyu; Ouyang, Kai; Wang, Zhi; Zhu, Yifei; Zhu, Wenwu

Abstract:Conventional model quantization methods use a fixed quantization scheme to different data samples, which ignores the inherent "recognition difficulty" differences between various samples. We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level. However, enabling this adaptive inference with changeable layer-wise quantization schemes is challenging because the combination of bit-widths and layers is growing exponentially, making it extremely difficult to train a single model in such a vast searching space and use it in practice. To solve this problem, we present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity. Specifically, first we build a weight-shared layer-wise quantizable "super-network" in which each layer can be allocated with multiple bit-widths and thus quantized differently on demand. The super-network provides a considerably large number of combinations of bit-widths and layers, each of which can be used during inference without retraining or storing myriad models. Second, based on the well-trained super-network, each layer's runtime bit-width selection decision is modeled as a Markov Decision Process (MDP) and solved by an adaptive inference strategy accordingly. Experiments show that the super-network can be built without accuracy degradation, and the bit-widths allocation of each layer can be adjusted to deal with various inputs on the fly. On ImageNet classification, we achieve 1.1% top1 accuracy improvement while saving 36.2% BitOps.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2204.09992 [cs.CV]
	(or arXiv:2204.09992v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2204.09992

Computer Science > Computer Vision and Pattern Recognition

Title:Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference Approach

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators