skip to main content
research-article

Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach

Published: 01 November 2012 Publication History

Abstract

This article studies the problem of mining entity translation, specifically, mining English and Chinese name pairs. Existing efforts can be categorized into (a) transliteration-based approaches that leverage phonetic similarity and (b) corpus-based approaches that exploit bilingual cooccurrences. These approaches suffer from inaccuracy and scarcity, respectively. In clear contrast, we use under-leveraged resources of monolingual entity cooccurrences crawled from entity search engines, which are represented as two entity-relationship graphs extracted from two language corpora, respectively. Our problem is then abstracted as finding correct mappings across two graphs. To achieve this goal, we propose a holistic approach to exploiting both transliteration similarity and monolingual cooccurrences. This approach, which builds upon monolingual corpora, complements existing corpus-based work requiring scarce resources of parallel or comparable corpus while significantly boosting the accuracy of transliteration-based work. In addition, by parallelizing the mapping process on multicore architectures, we speed up the computation by more than 10 times per unit accuracy. We validated the effectiveness and efficiency of our proposed approach using real-life datasets.

References

[1]
Abou-rjeili, A. and Karypis, G. 2006. Multilevel algorithms for partitioning power-law graphs. In Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS’06). IEEE Computer Society, 124--124.
[2]
Aiken, M. and Park, M. 2010. The efficacy of round-trip translation for MT evaluation. Translation J. 14, 1, 1--10.
[3]
Al-Onaizan, Y. and Knight, K. 2002. Translating named entities using monolingual and bilingual resources. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, 400--408.
[4]
Andreev, K. and Racke, H. 2004. Balanced Graph Partitioning. In Proceedings of the 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM Press, 120--124.
[5]
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithm 2nd Ed. MIT Press.
[6]
Dagum, L. and Menon, R. 1998. OpenMP: An industry-standard API for shared-memory programming. IEEE Computat. Sci. Eng. 5, 10, 46--55.
[7]
Engkoo. 2010. http://www.engkoo.com/.
[8]
EntityCube. 2010. http://entitycube.research.microsoft.com/.
[9]
Feng, D., Lü, Y., and Zhou, M. 2004. A new approach for english-chinese named entity alignment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). Association for Computational Linguistics, 372--379.
[10]
Fiduccia, C. M. and Mattheyses, R. M. 1982. A linear-time heuristic for improving network partitions. In Proceedings of the 19th Design Automation Conference (DAC’82). IEEE Press, 175--181.
[11]
Finkel, J. R., Grenager, T., and Manning, C. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 363--370.
[12]
Fung, P. and Yee, L. Y. 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of the 17th International Conference on Computational Linguistics (COLING’98). Association for Computational Linguistics, 414--420.
[13]
Gao, J., Li, M., and Huang, C. 2003. Improved source-channel models for Chinese word segmentation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL’03). Association for Computational Linguistics, 272--279.
[14]
GoogleTranslator. 2010. http://translate.google.com/.
[15]
Jiang, L., Zhou, M., Feng Chien, L., and Niu, C. 2007. Named entity translation with web mining and transliteration. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07). Morgan Kaufmann Publishers Inc., 1629--1634.
[16]
Jiang, L., Yang, S., Zhou, M., Liu, X., and Zhu, Q. 2009. Mining bilingual data from the web with adaptively learnt patterns. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL’09). Association for Computational Linguistics, 870--878.
[17]
Kernighan, B. W. and Lin, S. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Techn. J. 49, 2, 291--308.
[18]
Knight, K. and Graehl, J. 1998. Machine transliteration. Comput. Linguistics 24, 4, 599--612.
[19]
Kupiec, J. 1993. An Algorithm for finding Noun Phrase Correspondences in Bilingual Corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL’93). Association for Computational Linguistics, 17--22.
[20]
Lam, W., Chan, S.-K., and Huang, R. 2007. Named entity translation matching and learning: With application for mining unseen translations. ACM Trans. Inf. Syst. 25, 1, 1--32.
[21]
Leighton, T., Makedon, F., and Tragoudas, S. 1990. Approximation algorithms for VLSI partition problems. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’90). IEEE Press, 2865--2868.
[22]
Li, H., Min, Z., and Jian, S. 2004. A joint source-channel model for machine transliteration. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL’04). Association for Computational Linguistics, 159--166.
[23]
Lin, D., Zhao, S., Durme, B. V., and Pasca, M. 2008. Mining parenthetical translations from the web by word alignment. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL’08). Association for Computational Linguistics, 994--1002.
[24]
Metaphone. 2009. http://aspell.net/metaphone/.
[25]
Renlifang. 2010. http://renlifang.msra.cn/.
[26]
Shao, L. and Ng, H. T. 2004. Mining new word translations from comparable corpora. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). Association for Computational Linguistics, 618--624.
[27]
Simon, H. D. and Teng, S.-H. 1997. How good is recursive bisection? SIAM J. Sci. Comput. 18, 5, 1436--1445.
[28]
Sproat, R., Tao, T., and Zhai, C. 2006. Named entity transliteration with comparable corpora. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (ACL’06). Association for Computational Linguistics, 73--80.
[29]
Voorhees, E. M. 2001. The trec question answering track. Natural Lang. Eng. 7, 4, 361--378.
[30]
Wan, S. and Verspoor, C. M. 1998. Automatic English-Chinese name transliteration for development of multilingual resources. In Proceedings of the 17th International Conference on Computational Linguistics (COLING’98). Association for Computational Linguistics, 1352--1356.
[31]
West, D. B. 2000. Introduction to Graph Theory 2nd Ed. Prentice Hall.
[32]
You, G., Hwang, S., Song, Y.-I., Jiang, L., and Nie, Z. 2010. Mining name translations from entity graph mapping. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, 430--439.

Cited By

View all

Index Terms

  1. Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 30, Issue 4
      November 2012
      216 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/2382438
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 November 2012
      Accepted: 01 June 2012
      Revised: 01 April 2012
      Received: 01 September 2011
      Published in TOIS Volume 30, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Entity mining
      2. graph alignment
      3. parallelization
      4. translation

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media