skip to main content
10.1145/3447548.3467196acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Published: 14 August 2021 Publication History

Abstract

Named entity recognition (NER) is a fundamental component in many applications, such as Web Search and Voice Assistants. Although deep neural networks greatly improve the performance of NER, due to the requirement of large amounts of training data, deep neural networks can hardly scale out to many languages in an industry setting. To tackle this challenge, cross-lingual NER transfers knowledge from a rich-resource language to languages with low resources through pre-trained multilingual language models. Instead of using training data in target languages, cross-lingual NER has to rely on only training data in source languages, and optionally adds the translated training data derived from source languages. However, the existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages, which is relatively easy to collect in industry applications. To address the opportunities and challenges, in this paper we describe our novel practice in Microsoft to leverage such large amounts of unlabeled data in target languages in real production settings. To effectively extract weak supervision signals from the unlabeled data, we develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning. The empirical study on three benchmark data sets verifies that our approach establishes the new state-of-the-art performance with clear edges. Now, the NER techniques reported in this paper are on their way to become a fundamental component for Web ranking, Entity Pane, Answers Triggering, and Question Answering in the Microsoft Bing search engine. Moreover, our techniques will also serve as part of the Spoken Language Understanding module for a commercial voice assistant. We plan to open source the code of the prototype framework after deployment.

References

[1]
A. Akbik. 2019. Pooled Contextualized Embeddings for Named Entity Recognition. In NAACL-HLT. 724--728.
[2]
A. Conneau et al. 2020 a. Unsupervised Cross-lingual Representation Learning at Scale. In ACL. 8440--8451.
[3]
A. Rahimi et al. 2019 a. Massively Multilingual Transfer for NER. In ACL. 151--164.
[4]
B. Liu et al. 2019 b. A user-centered concept mining system for query and document understanding at tencent. In KDD. 1831--1841.
[5]
C. Liang et al. 2020 b. Bond: Bert-assisted open-domain named entity recognition with distant supervision. In KDD. 1054--1064.
[6]
C. Tsai et al. 2016a. Cross-Lingual Named Entity Recognition via Wikification. In CoNLL. 219--228.
[7]
D. Wang et al. 2017a. A Multi-task Learning Approach to Adapting Bilingual Word Embeddings for Cross-lingual Named Entity Recognition. In IJCNLP. 383--388.
[8]
F. Yuan et al. 2020 c. Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension. In ACL.
[9]
F. Yuan et al. 2020 d. Reinforced Multi-Teacher Selection for Knowledge Distillation. arXiv preprint arXiv:2012.06048 (2020).
[10]
G. Hinton et al. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[11]
H. Huang et al. 2019 c. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks. In EMNLP-IJCNLP. 2485--2494.
[12]
H. Li et al. 2020 e. MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark. arXiv preprint arXiv:2008.09335 (2020).
[13]
J. Devlin et al. 2019 d. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.
[14]
J. Ni et al. 2017b. Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection. In ACL. 1470--1480.
[15]
J. Pfeiffer et al. 2020 f. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In EMNLP. 7654--7673.
[16]
J. Xie et al. 2018a. Neural Cross-lingual Named Entity Recognition with Minimal Resources. In EMNLP. 369--379.
[17]
L. Shou et al. 2020 g. Mining Implicit Relevance Feedback from User Behavior for Web Question Answering. KDD.
[18]
L. Wu et al. 2018b. A Study of Reinforcement Learning for Neural Machine Translation. In EMNLP. 3612--3621.
[19]
M. Artetxe et al. 2020 h. Translation Artifacts in Cross-lingual Transfer Learning. In EMNLP. 7674--7684.
[20]
M. Bari et al. 2020 i. Zero-Resource Cross-Lingual Named Entity Recognition. In AAAI. 7415--7423.
[21]
Q. Wu et al. 2020 j. Enhanced Meta-Learning for Cross-Lingual Named Entity Recognition with Minimal Resources. In AAAI. 9274--9281.
[22]
Q. Wu et al. 2020 k. Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language. In ACL. 6505--6514.
[23]
Q. Xie et al. 2020 l. Self-training with noisy student improves imagenet classification. In CVPR. 10687--10698.
[24]
R. Sutton et al. 1999. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In NIPS. 1057--1063.
[25]
S. Mayhew et al. 2017c. Cheap Translation for Cross-Lingual Named Entity Recognition. In EMNLP. 2536--2545.
[26]
W. Bo et al. 2019 e. A minimax game for instance based selective transfer learning. In KDD. 34--43.
[27]
X. Dong et al. 2020 m. Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web. In KDD. 3543--3544.
[28]
X. Liu et al. 2020 n. Self-supervised Learning: Generative or Contrastive. arXiv preprint arXiv:2006.08218 (2020).
[29]
X. Pan et al. 2017 d. Cross-lingual Name Tagging and Linking for 282 Languages. In ACL. 1946--1958.
[30]
Y. Wu et al. 2016b. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
[31]
Z. Yang et al. 2017 e. Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks. In ICLR.
[32]
Z. Yang et al. 2020 o. Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System. In WSDM. 690--698.
[33]
Ana Valeria Gonzá lez-Gardu n o. 2019. Reinforcement Learning for Improved Low Resource Dialogue Generation. In AAAI. 9884--9885.
[34]
Z. Liu. 2020. Do We Need Word Order Information for Cross-lingual Sequence Labeling. arXiv preprint arXiv:2001.11164 (2020).
[35]
I. Loshchilov and F. Hutter. 2017. Fixing Weight Decay Regularization in Adam. CoRR, Vol. abs/1711.05101 (2017).
[36]
T. Moon. 2019. Towards lingua franca named entity recognition with bert. arXiv preprint arXiv:1912.01389 (2019).
[37]
D. Nadeau and S. Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes, Vol. 30, 1 (2007), 3--26.
[38]
E. Sang. 2002. Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In CoNLL.
[39]
E. Sang and F. De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In CoNLL. 142--147.
[40]
R.S. Sutton and A.G. Barto. 2018. Reinforcement learning: An introduction .MIT press.
[41]
O. T"ackström. 2012. Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition. In NAACL-HLT Workshop on the Induction of Linguistic Structure. 55--63.
[42]
G. Tur and R. De Mori (Eds.). 2011. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech.
[43]
S. Wu and M. Dredze. 2019. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. In EMNLP-IJCNLP. 833--844.
[44]
S. Wu and M. Dredze. 2020. Do Explicit Alignments Robustly Improve Multilingual Encoders?. In EMNLP. 4471--4482.
[45]
R. Xu and Y. Yang. 2017. Cross-lingual Distillation for Text Classification. In ACL. 1415--1425.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross lingual
  2. knowledge distillation
  3. named entity recognition
  4. reinforcement learning

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media