research-article

Differentiable Slimming for Memory-Efficient Transformers

Authors:

Nikolay Penkov,

Konstantinos Balaskas,

Martin Rapp,

Joerg HenkelAuthors Info & Claims

IEEE Embedded Systems Letters, Volume 15, Issue 4

Pages 186 - 189

https://doi.org/10.1109/LES.2023.3299638

Published: 01 December 2023 Publication History

Abstract

Transformer models are continuously achieving state-of-the-art performance on a wide range of benchmarks. To meet demanding performance targets, the number of model parameters is continuously increased. As a result, state-of-the-art Transformers require substantial computational resources prohibiting their deployment on consumer-grade hardware. In the literature, overparameterized Transformers are successfully reduced in size with the help of pruning strategies. Existing works lack the ability to optimize the full architecture, without incurring significant overheads, in a fully differentiable manner. Our work proposes a single-stage approach for training a Transformer for memory-efficient inference and various resource-constrained scenarios. Transformer blocks are extended with trainable gate parameters, which attribute importance and control information flow. Their integration into a differentiable pruning-aware training scheme allows the extraction of extremely sparse subnetworks at runtime, with minimal performance degradation. Evaluative pruning results, at the attention head and layer levels, illustrate the memory efficiency of our trained subnetworks under various memory budgets.

References

[1]

Y. Liuet al., “A survey of visual transformers,” IEEE Trans. Neural Netw. Learn. Syst., early access, Mar. 30, 2023. 10.1109/TNNLS.2022.3227717.

Crossref

Google Scholar

[2]

T. Brownet al., “Language models are few-shot learners,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 1877–1901.

Google Scholar

[3]

P. Michel, O. Levy, and G. Neubig, “Are sixteen heads really better than one?” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 1–11.

Google Scholar

[4]

A. Fan, E. Grave, and A. Joulin, “Reducing transformer depth on demand with structured dropout,” in Proc. Int. Conf. Learn. Representations, 2020, pp. 1–16.

Google Scholar

[5]

E. Voita, D. Talbot, F. Moiseev, R. Sennrich, and I. Titov, “Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned,” in Proc. 57th Annu. Meet. Assoc. Comput. Linguistics, 2019, pp. 5797–5808.

Google Scholar

[6]

M. Elbayad, J. Gu, E. Grave, and M. Auli, “Depth-adaptive transformer,” in Proc. Int. Conf. Learn. Representations, 2020.

Google Scholar

[7]

L. Hou, Z. Huang, L. Shang, X. Jiang, X. Chen, and Q. Liu, “DynaBERT: Dynamic BERT with adaptive width and depth,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 9782–9793.

Google Scholar

[8]

H. Wanget al., “Hat: Hardware-aware transformers for efficient natural language processing,” in Proc. Annu. Conf. Assoc. Comput. Linguistics, 2020, pp. 7675–7688.

Google Scholar

[9]

A. Vaswaniet al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017.

Google Scholar

[10]

H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” in Proc. Int. Conf. Learn. Representations, 2019.

Google Scholar

Index Terms

Differentiable Slimming for Memory-Efficient Transformers

Index terms have been assigned to the content through auto-classification.

Recommendations

A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
A durable and energy efficient main memory using phase change memory technology

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
Energy efficient Phase Change Memory based main memory for future high performance systems
IGCC '11: Proceedings of the 2011 International Green Computing Conference and Workshops

Phase Change Memory (PCM) has recently attracted a lot of attention as a scalable alternative to DRAM for main memory systems. As the need for high-density memory increases, DRAM has proven to be less attractive from the point of view of scaling and ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Embedded Systems Letters

IEEE Embedded Systems Letters Volume 15, Issue 4

Dec. 2023

73 pages

ISSN:1943-0663

Issue’s Table of Contents

1943-0671 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 December 2023

Qualifiers

Research-article

Index Terms

Recommendations

A durable and energy efficient main memory using phase change memory technology

A durable and energy efficient main memory using phase change memory technology

Energy efficient Phase Change Memory based main memory for future high performance systems

Comments

Published In

Publisher

Publication History

Qualifiers

Other Metrics

Article Metrics

Other Metrics

Abstract

References

Index Terms

Recommendations

A durable and energy efficient main memory using phase change memory technology

A durable and energy efficient main memory using phase change memory technology

Energy efficient Phase Change Memory based main memory for future high performance systems

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Share

Share this Publication link

Share on social media

Affiliations