RPTQ: Reorder-based Post-training Quantization for Large Language Models

Yuan, Zhihang; Niu, Lin; Liu, Jiawei; Liu, Wenyu; Wang, Xinggang; Shang, Yuzhang; Sun, Guangyu; Wu, Qiang; Wu, Jiaxiang; Wu, Bingzhe

Computer Science > Computation and Language

arXiv:2304.01089 (cs)

[Submitted on 3 Apr 2023 (v1), last revised 17 May 2023 (this version, v4)]

Title:RPTQ: Reorder-based Post-training Quantization for Large Language Models

Authors:Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

View PDF

Abstract:Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be alleviated through quantization. In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers. To address this challenge, we introduce a quantization method called RPTQ, which utilizes a reorder-based approach. By rearranging the channels and quantizing them in clusters, RPTQ effectively mitigates the impact of range differences between channels. To minimize the overhead of the reorder operation, we fuse it into the layer norm operation and weights in linear layers. In our experiments, RPTQ achieved a significant breakthrough by utilizing 3-bit activation in LLMs for the first time, resulting in a substantial reduction in memory usage. For instance, quantizing OPT-175b can lead to a memory consumption reduction of up to 80%.

Comments:	18 pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2304.01089 [cs.CL]
	(or arXiv:2304.01089v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.01089

Submission history

From: Zhihang Yuan [view email]
[v1] Mon, 3 Apr 2023 15:46:15 UTC (5,741 KB)
[v2] Thu, 6 Apr 2023 15:51:17 UTC (5,112 KB)
[v3] Tue, 25 Apr 2023 06:29:00 UTC (5,112 KB)
[v4] Wed, 17 May 2023 10:07:33 UTC (5,117 KB)

Computer Science > Computation and Language

Title:RPTQ: Reorder-based Post-training Quantization for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:RPTQ: Reorder-based Post-training Quantization for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators