Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

William Held; Diyi Yang

doi:10.18653/v1/2023.eacl-main.177

Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

Abstract

Multilingual transformer-based models demonstrate remarkable zero and few-shot transfer across languages by learning and reusing language-agnostic features. However, as a fixed-size model acquires more languages, its performance across all languages degrades. Those who attribute this interference phenomenon to limited model capacity address the problem by adding additional parameters, despite evidence that transformer-based models are overparameterized. In this work, we show that it is possible to reduce interference by instead identifying and pruning language-specific attention heads. First, we use Shapley Values, a credit allocation metric from coalitional game theory, to identify attention heads that introduce interference. Then, we show that pruning such heads from a fixed model improves performance for a target language on both sentence classification and structural prediction. Finally, we provide insights on language-agnostic and language-specific attention heads using attention visualization.

Anthology ID:: 2023.eacl-main.177
Volume:: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:: May
Year:: 2023
Address:: Dubrovnik, Croatia
Editors:: Andreas Vlachos, Isabelle Augenstein
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2416–2427
Language:
URL:: https://aclanthology.org/2023.eacl-main.177
DOI:: 10.18653/v1/2023.eacl-main.177
Bibkey:
Cite (ACL):: William Held and Diyi Yang. 2023. Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2416–2427, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):: Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers (Held & Yang, EACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.eacl-main.177.pdf
Video:: https://aclanthology.org/2023.eacl-main.177.mp4

PDF Cite Search Video