Merino: Entropy-driven Design for Generative Language Models on IoT Devices

Zhao, Youpeng; Lin, Ming; Tang, Huadong; Wu, Qiang; Wang, Jun

Computer Science > Machine Learning

arXiv:2403.07921 (cs)

[Submitted on 28 Feb 2024 (v1), last revised 10 Dec 2024 (this version, v2)]

Title:Merino: Entropy-driven Design for Generative Language Models on IoT Devices

Authors:Youpeng Zhao, Ming Lin, Huadong Tang, Qiang Wu, Jun Wang

View PDF HTML (experimental)

Abstract:Generative Large Language Models (LLMs) stand as a revolutionary advancement in the modern era of artificial intelligence (AI). However, scaling down LLMs for resource-constrained hardware, such as Internet-of-Things (IoT) devices requires non-trivial efforts and domain knowledge. In this paper, we propose a novel information-entropy framework for designing mobile-friendly generative language models. The whole design procedure involves solving a mathematical programming (MP) problem, which can be done on the CPU within minutes, making it nearly zero-cost. We evaluate our designed models, termed MeRino, across fourteen NLP downstream tasks, showing their competitive performance against the state-of-the-art autoregressive transformer models under the mobile setting. Notably, MeRino achieves similar or better performance on both language modeling and zero-shot learning tasks, compared to the 350M parameter OPT while being 4.9x faster on NVIDIA Jetson Nano with 5.5x reduction in model size.

Comments:	AAAI 2025
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2403.07921 [cs.LG]
	(or arXiv:2403.07921v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.07921

Submission history

From: Youpeng Zhao [view email]
[v1] Wed, 28 Feb 2024 03:20:27 UTC (981 KB)
[v2] Tue, 10 Dec 2024 23:01:28 UTC (1,059 KB)

Computer Science > Machine Learning

Title:Merino: Entropy-driven Design for Generative Language Models on IoT Devices

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Merino: Entropy-driven Design for Generative Language Models on IoT Devices

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators