Multi-Path Transformer is Better: A Case Study on Neural Machine Translation

Ye Lin; Shuhan Zhou; Yanyang Li; Anxiang Ma; Tong Xiao; Jingbo Zhu

doi:10.18653/v1/2022.findings-emnlp.414

Multi-Path Transformer is Better: A Case Study on Neural Machine Translation

Ye Lin, Shuhan Zhou, Yanyang Li, Anxiang Ma, Tong Xiao, Jingbo Zhu

Abstract

For years the model performance in machine learning obeyed a power-law relationship with the model size. For the consideration of parameter efficiency, recent studies focus on increasing model depth rather than width to achieve better performance. In this paper, we study how model width affects the Transformer model through a parameter-efficient multi-path structure. To better fuse features extracted from different paths, we add three additional operations to each sublayer: a normalization at the end of each path, a cheap operation to produce more features, and a learnable weighted mechanism to fuse all features flexibly. Extensive experiments on 12 WMT machine translation tasks show that, with the same number of parameters, the shallower multi-path model can achieve similar or even better performance than the deeper model. It reveals that we should pay more attention to the multi-path structure, and there should be a balance between the model depth and width to train a better large-scale Transformer.

Anthology ID:: 2022.findings-emnlp.414
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5646–5656
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.414
DOI:: 10.18653/v1/2022.findings-emnlp.414
Bibkey:
Cite (ACL):: Ye Lin, Shuhan Zhou, Yanyang Li, Anxiang Ma, Tong Xiao, and Jingbo Zhu. 2022. Multi-Path Transformer is Better: A Case Study on Neural Machine Translation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5646–5656, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Multi-Path Transformer is Better: A Case Study on Neural Machine Translation (Lin et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.414.pdf

PDF Cite Search Fix metadata