skip to main content
10.1145/3459637.3482452acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval

Published: 30 October 2021 Publication History

Abstract

Pre-trained contextualized representations offer great success for many downstream tasks, including document ranking. The multilingual versions of such pre-trained representations provide a possibility of jointly learning many languages with the same model. Although it is expected to gain big with such joint training, in the case of cross-lingual information retrieval (CLIR), the models under a multilingual setting are not achieving the same level of performance as those under a monolingual setting. We hypothesize that the performance drop is due to thetranslation gap between query and documents. In the monolingual retrieval task, because of the same lexical inputs, it is easier for model to identify the query terms that occurred in documents. However, in the multilingual pre-trained models that the words in different languages are projected into the same hyperspace, the model tends to "translate" query terms into related terms - i.e., terms that appear in a similar context - in addition to or sometimes rather than synonyms in the target language. This property is creating difficulties for the model to connect terms that co-occur in both query and document. To address this issue, we propose a novel Mixed Attention Transformer (MAT) that incorporates external word-level knowledge, such as a dictionary or translation table. We design a sandwich-like architecture to embed MAT into the recent transformer-based deep neural models. By encoding the translation knowledge into an attention matrix, the model with MAT is able to focus on the mutually translated words in the input sequence. Experimental results demonstrate the effectiveness of the external knowledge and the significant improvement of MAT-embedded neural reranking model on CLIR task.

References

[1]
Hamed Bonab, James Allan, and Ramesh Sitaraman. 2019. Simulating CLIR Translation Resource Scarcity Using High-Resource Languages. In Proceedings of the 2019aCM SIGIR International Conference on Theory of Information Retrieval (Santa Clara, CA, USA) (ICTIR '19). Association for Computing Machinery, New York, NY, USA, 129--136. https://doi.org/10.1145/3341981.3344236
[2]
Hamed Bonab, Sheikh Muhammad Sarwar, and James Allan. 2020. Training Effective Neural CLIR by Bridging the Translation Gap. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 9--18.
[3]
Martin Braschler. 2001. CLEF 2000 -- Overview of Results. In Cross-Language Information Retrieval and Evaluation, Carol Peters (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 89--101.
[4]
Martin Braschler. 2002 a. CLEF 2001 -- Overview of Results. In Evaluation of Cross-Language Information Retrieval Systems, Carol Peters, Martin Braschler, Julio Gonzalo, and Michael Kluck (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 9--26.
[5]
Martin Braschler. 2002 b. CLEF 2002-Overview of results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 9--27.
[6]
Martin Braschler. 2003. CLEF 2003--Overview of results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 44--63.
[7]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Édouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020a. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8440--8451.
[8]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020b. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8440--8451. https://doi.org/10.18653/v1/2020.acl-main.747
[9]
Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2017. Word translation without parallel data. arXiv preprint arXiv:1710.04087 (2017).
[10]
Goncc alo M. Correia, Vlad Niculae, and André F. T. Martins. 2019. Adaptively Sparse Transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 2174--2184. https://doi.org/10.18653/v1/D19--1223
[11]
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft. 2017. Neural ranking models with weak supervision. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 65--74.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423
[13]
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (Oct 2016). https://doi.org/10.1145/2983323.2983769
[14]
Bin He, Di Zhou, JingHui Xiao, X. Jiang, Qun Liu, Nicholas Jing Yuan, and T. Xu. 2020. Integrating Graph Contextualized Knowledge into Pre-trained Language Models. ArXiv, Vol. abs/1912.00147 (2020).
[15]
Zhuolin Jiang, Amro El-Jaroudi, William Hartmann, Damianos Karakos, and Lingjun Zhao. 2020. Cross-lingual Information Retrieval with BERT. In Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020). European Language Resources Association, Marseille, France, 26--31. https://www.aclweb.org/anthology/2020.clssts-1.5
[16]
David Kamholz, Jonathan Pool, and Susan M Colowick. 2014. PanLex: Building a Resource for Panlingual Lexical Translation. In LREC. 3145--3150.
[17]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. (2015).
[18]
Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit, Vol. 5. Citeseer, 79--86.
[19]
Anne Lauscher, Ivan Vuli'c, E. Ponti, A. Korhonen, and Goran Glavavs. 2020. Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity. In COLING.
[20]
Yoav Levine, Barak Lenz, Or Dagan, Ori Ram, Dan Padnos, Or Sharir, S. Shalev-Shwartz, A. Shashua, and Y. Shoham. 2020. SenseBERT: Driving Some Sense into BERT. ArXiv, Vol. abs/1908.05646 (2020).
[21]
Bo Li and Ping Cheng. 2018. Learning neural representation for clir with adversarial framework. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1861--1870.
[22]
Robert Litschko, Goran Glavavs, Simone Paolo Ponzetto, and Ivan Vulić. 2018. Unsupervised cross-lingual information retrieval using monolingual data only. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1253--1256.
[23]
Robert Litschko, Goran Glavavs, Ivan Vulic, and Laura Dietz. 2019. Evaluating Resource-Lean Cross-Lingual Embedding Models in Unsupervised Retrieval. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 1109--1112. https://doi.org/10.1145/3331184.3331324
[24]
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1101--1104.
[25]
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019).
[26]
Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational linguistics, Vol. 29, 1 (2003), 19--51.
[27]
Carol Peters. 2005. What Happened in CLEF 2004?. In Multilingual Information Access for Text, Speech and Images, Carol Peters, Paul Clough, Julio Gonzalo, Gareth J. F. Jones, Michael Kluck, and Bernardo Magnini (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--9.
[28]
Carol Peters. 2006. What Happened in CLEF 2005. In Accessing Multilingual Information Repositories, Carol Peters, Fredric C. Gey, Julio Gonzalo, Henning Müller, Gareth J. F. Jones, Michael Kluck, Bernardo Magnini, and Maarten de Rijke (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--10.
[29]
Carol Peters. 2007. What Happened in CLEF 2006. In Evaluation of Multilingual and Multi-modal Information Retrieval, Carol Peters, Paul Clough, Fredric C. Gey, Jussi Karlgren, Bernardo Magnini, Douglas W. Oard, Maarten de Rijke, and Maximilian Stempfhuber (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--10.
[30]
Carol Peters. 2008. What Happened in CLEF 2007. In Advances in Multilingual and Multimodal Information Retrieval, Carol Peters, Valentin Jijkoun, Thomas Mandl, Henning Müller, Douglas W. Oard, Anselmo Pe n as, Vivien Petras, and Diana Santos (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--12.
[31]
Carol Peters. 2009. What Happened in CLEF 2008. In Evaluating Systems for Multilingual and Multimodal Information Access, Carol Peters, Thomas Deselaers, Nicola Ferro, Julio Gonzalo, Gareth J. F. Jones, Mikko Kurimo, Thomas Mandl, Anselmo Pe n as, and Vivien Petras (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--14.
[32]
Matthew E. Peters, Mark Neumann, IV RobertLLogan, Roy Schwartz, V. Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge Enhanced Contextual Word Representations. In EMNLP/IJCNLP.
[33]
Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How Multilingual is Multilingual BERT?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4996--5001. https://doi.org/10.18653/v1/P19--1493
[34]
Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2019. Understanding the Behaviors of BERT in Ranking. arxiv: 1904.07531 [cs.IR]
[35]
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at TREC-3. Nist Special Publication Sp, Vol. 109 (1995), 109.
[36]
Shadi Saleh and Pavel Pecina. 2020. Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 6849--6860. https://doi.org/10.18653/v1/2020.acl-main.613
[37]
Sheikh Muhammad Sarwar, Hamed Bonab, and James Allan. 2019. A Multi-Task Architecture on Relevance-based Neural Query Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 6339--6344. https://doi.org/10.18653/v1/P19--1639
[38]
Shota Sasaki, Shuo Sun, Shigehiko Schamoni, Kevin Duh, and Kentaro Inui. 2018. Cross-lingual learning-to-rank with shared representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 458--463.
[39]
Shigehiko Schamoni, Felix Hieber, Artem Sokolov, and Stefan Riezler. 2014. Learning translational and knowledge-based similarities from relevance rankings for cross-language retrieval. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 488--494.
[40]
Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4593--4601. https://doi.org/10.18653/v1/P19--1452
[41]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).
[42]
Ivan Vulić and Marie-Francine Moens. 2015. Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 363--372.
[43]
Tingyu Xia, Yue Wang, Yuan Tian, and Yi Chang. 2021. Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks. (2021).
[44]
Wenhan Xiong, Jingfei Du, William Yang Wang, and Veselin Stoyanov. 2020. Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model. ArXiv, Vol. abs/1912.09637 (2020).
[45]
Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, and Kevin Duh. 2019. Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings. In Proceedings of Machine Translation Summit XVII Volume 1: Research Track. European Association for Machine Translation, Dublin, Ireland, 12--20. https://www.aclweb.org/anthology/W19--6602
[46]
Andrew Yates, Rodrigo Nogueira, and Jimmy Lin. 2021. Pretrained Transformers for Text Ranking: BERT and Beyond. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials. Association for Computational Linguistics, Online, 1--4. https://www.aclweb.org/anthology/2021.naacl-tutorials.1
[47]
Puxuan Yu and James Allan. 2020. A Study of Neural Matching Models for Cross-Lingual IR .Association for Computing Machinery, New York, NY, USA, 1637--1640. https://doi.org/10.1145/3397271.3401322
[48]
Puxuan Yu, Hongliang Fei, and Ping Li. 2021. Cross-Lingual Language Model Pretraining for Retrieval. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW '21). Association for Computing Machinery, New York, NY, USA, 1029--1039. https://doi.org/10.1145/3442381.3449830
[49]
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. An Analysis of BERT in Document Ranking. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 1941--1944. https://doi.org/10.1145/3397271.3401325
[50]
Le Zhang, Damianos Karakos, William Hartmann, Manaj Srivastava, Lee Tarlin, David Akodes, Sanjay Krishna Gouda, Numra Bathool, Lingjun Zhao, Zhuolin Jiang, Richard Schwartz, and John Makhoul. 2020. The 2019bBN Cross-lingual Information Retrieval System. In Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020). European Language Resources Association, Marseille, France, 44--51. https://www.aclweb.org/anthology/2020.clssts-1.8
[51]
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1441--1451. https://doi.org/10.18653/v1/P19--1139
[52]
Lingjun Zhao, Rabih Zbib, Zhuolin Jiang, Damianos Karakos, and Zhongqiang Huang. 2019. Weakly supervised attentional model for low resource ad-hoc cross-lingual information retrieval. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019). 259--264.
[53]
Yiyun Zhao and Steven Bethard. 2020. How does BERT's attention change when you fine-tune? An analysis methodology and a case study in negation scope. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4729--4747. https://doi.org/10.18653/v1/2020.acl-main.429

Cited By

View all
  • (2024)Query in Your Tongue: Reinforce Large Language Models with Retrievers for Cross-lingual Search Generative ExperienceProceedings of the ACM Web Conference 202410.1145/3589334.3645701(1529-1538)Online publication date: 13-May-2024
  • (2023)Soft Prompt Decoding for Multilingual Dense RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591769(1208-1218)Online publication date: 19-Jul-2023
  • (2023)Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport DistillationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570468(1048-1056)Online publication date: 27-Feb-2023
  • Show More Cited By

Index Terms

  1. Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
      October 2021
      4966 pages
      ISBN:9781450384469
      DOI:10.1145/3459637
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 October 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. attention mechanism
      2. cross-lingual information retrieval
      3. neural network

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      CIKM '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)229
      • Downloads (Last 6 weeks)15
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Query in Your Tongue: Reinforce Large Language Models with Retrievers for Cross-lingual Search Generative ExperienceProceedings of the ACM Web Conference 202410.1145/3589334.3645701(1529-1538)Online publication date: 13-May-2024
      • (2023)Soft Prompt Decoding for Multilingual Dense RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591769(1208-1218)Online publication date: 19-Jul-2023
      • (2023)Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport DistillationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570468(1048-1056)Online publication date: 27-Feb-2023
      • (2022)C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc RetrievalProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531886(2507-2512)Online publication date: 6-Jul-2022

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media