short-paper

SCOPA: Soft Code-Switching and Pairwise Alignment for Zero-Shot Cross-lingual Transfer

Authors:

Byung-gon Chun,

Seung-won HwangAuthors Info & Claims

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 3176 - 3180

https://doi.org/10.1145/3459637.3482176

Published: 30 October 2021 Publication History

Abstract

The recent advent of cross-lingual embeddings, such as multilingual BERT (mBERT), provides a strong baseline for zero-shot cross-lingual transfer. There also exists increasing research attention to reduce the alignment discrepancy of cross-lingual embeddings between source and target languages, via generating code-switched sentences by substituting randomly selected words in the source languages with their counterparts of the target languages. Although these approaches improve the performance, naively code-switched sentences can have inherent limitations. In this paper, we propose SCOPA, a novel technique to improve the performance of zero-shot cross-lingual transfer. Instead of using the embeddings of code-switched sentences directly, SCOPA mixes them softly with the embeddings of original sentences. In addition, SCOPA utilizes an additional pairwise alignment objective, which aligns the vector differences of word pairs instead of word-level embeddings, in order to transfer contextualized information between different languages while preserving language-specific information. Experiments on the PAWS-X and MLDoc dataset show the effectiveness of SCOPA.

References

[1]

Mikel Artetxe, Sebastian Ruder, and Dani Yogatama. 2020. On the Cross -Lingual Transferability of Monolingual Representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4623--4637. https://doi.org/10.18653/v1/2020.acl-main.421

[2]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models Are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877--1901.

[3]

Jiaao Chen, Zichao Yang, and Diyi Yang. 2020. MixText: Linguistically -Informed Interpolation of Hidden Space for Semi -Supervised Text Classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2147--2157. https://doi.org/10.18653/v1/2020.acl-main.194

[4]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-Lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8440--8451. https://doi.org/10.18653/v1/2020.acl-main.747

[5]

Alexis CONNEAU and Guillaume Lample. 2019. Cross-Lingual Language Model Pretraining. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc.

Digital Library

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre -Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers ). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423

[7]

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8342--8360. https://doi.org/10.18653/v1/2020.acl-main.740

[8]

Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, and Edouard Grave. 2018. Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2979--2984. https://doi.org/10.18653/v1/D18-1330

[9]

Guillaume Lample, Alexis Conneau, Marc'Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word Translation without Parallel Data. In International Conference on Learning Representations.

[10]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART : Denoising Sequence -to-Sequence Pre -Training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7871--7880. https://doi.org/10.18653/v1/2020.acl-main.703

[11]

Zihan Liu, Genta Indra Winata, Zhaojiang Lin, Peng Xu, and Pascale Fung. 2020. Attention-Informed Mixed -Language Training for Zero -Shot Cross -Lingual Task -Oriented Dialogue Systems. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 05 (April 2020), 8433--8440. https://doi.org/10.1609/aaai.v34i05.6362

[12]

Libo Qin, Minheng Ni, Yue Zhang, and Wanxiang Che. 2020. CoSDA -ML : Multi -Lingual Code -Switching Data Augmentation for Zero -Shot Cross -Lingual NLP. In Proceedings of the Twenty -Ninth International Joint Conference on Artificial Intelligence, IJCAI -20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3853--3860. https://doi.org/10.24963/ijcai.2020/533

[13]

Sebastian Ruder, Ivan Vulić, and Anders Søgaard. 2019. A survey of cross-lingual word embedding models. Journal of Artificial Intelligence Research, Vol. 65 (2019), 569--631.

Digital Library

[14]

Holger Schwenk and Xian Li. 2018. A Corpus for Multilingual Document Classification in Eight Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan.

[15]

Jasdeep Singh, Bryan McCann, Richard Socher, and Caiming Xiong. 2019. BERT Is Not an Interlingua and the Bias of Tokenization. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low -Resource NLP (DeepLo 2019). Association for Computational Linguistics, Hong Kong, China, 47--55. https://doi.org/10.18653/v1/D19-6106

[16]

Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, and Jose Camacho-Collados. 2021. BERT Is to NLP What AlexNet Is to CV: Can Pre-Trained Language Models Identify Analogies?. In Proceedings of the ACL -IJCNLP 2021 Main Conference. Association for Computational Linguistics.

[17]

Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, and Ting Liu. 2019. Cross-Lingual BERT Transformation for Zero -Shot Dependency Parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP -IJCNLP ). Association for Computational Linguistics, Hong Kong, China, 5721--5727. https://doi.org/10.18653/v1/D19-1575

[18]

Yinfei Yang, Yuan Zhang, Chris Tar, and Jason Baldridge. 2019. PAWS -X : A Cross-Lingual Adversarial Dataset for Paraphrase Identification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP -IJCNLP ). Association for Computational Linguistics, Hong Kong, China, 3687--3692. https://doi.org/10.18653/v1/D19-1382

[19]

Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. Mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations.

Cited By

Index Terms

SCOPA: Soft Code-Switching and Pairwise Alignment for Zero-Shot Cross-lingual Transfer
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

SemEval-2010 task 3: cross-lingual word sense disambiguation
SEW '09: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions

We propose a multilingual unsupervised Word Sense Disambiguation (WSD) task for a sample of English nouns. Instead of providing manually sensetagged examples for each sense of a polysemous noun, our sense inventory is built up on the basis of the ...
Hindi Word Sense Disambiguation Using Lesk Approach on Bigram and Trigram Words
AICTC '16: Proceedings of the International Conference on Advances in Information Communication Technology & Computing

Word Sense Disambiguation (WSD) is a vital task which provides the definition of particular words according to their sense or according to given context. Lesk algorithm is originally based on the gloss overlap that can be observed as the measure, ...
Toward an Effective Igbo Part-of-Speech Tagger

Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

October 2021

4966 pages

ISBN:9781450384469

DOI:10.1145/3459637

General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

FriendliAI
SNU AI Graduate School Program 2021-0-01343
ITRC IITP-2021-2020-0-01789

Conference

CIKM '21

Sponsor:

CIKM '21: The 30th ACM International Conference on Information and Knowledge Management

November 1 - 5, 2021

Queensland, Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
270
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents