skip to main content
10.3115/1220575.1220661dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
Article
Free access

A translation model for sentence retrieval

Published: 06 October 2005 Publication History

Abstract

In this work we propose a translation model for monolingual sentence retrieval. We propose four methods for constructing a parallel corpus. Of the four methods proposed, a lexicon learned from a bilingual Arabic-English corpus aligned at the sentence level performs best, significantly improving results over the query likelihood baseline. Further, we demonstrate that smoothing from the local context of the sentence improves retrieval over the query likelihood baseline.

References

[1]
Y. Al-Onaizan, J. Curin, M. Jahr, K. Knight, J. Lafferty, I. D. Melamed, F. J. Och, D. Purdy, N. A. Smith, and D. Yarowsky. 1999. Statistical machine translation, final report, JHU workshop.
[2]
Adam Berger and John Lafferty. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual Conference on Research and Development in Information Retrieval (ACM SIGIR).
[3]
Adam Berger, Rich Caruana, David Cohn, Dayne Freitag, and Vibhu Mittal. 2000. Bridging the lexical chasm: Statistical approaches to answer-finding. In Proceedings of the 23rd Annual Conference on Research and Development in Information Retrieval (ACM SIGIR), pages 192--199.
[4]
Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Frederick Jelineck, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics, 16(2):79--85.
[5]
Kevyn Collins-Thompson, Paul Ogilvie, Yi Zhang, and Jamie Callan. 2002. Information filtering, novelty detection and named-page finding. In Proceedings of the Eleventh Text Retrieval Conference (TREC).
[6]
Donna Harman. 2002. Overview of the TREC 2002 novelty track. In Proceedings of the Eleventh Text Retrieval Conference (TREC).
[7]
Leah Larkey, James Allan, Margie Connell, Alvaro Bolivar, and Courtney Wade. 2002. UMass at TREC 2002: Cross language and novelty tracks. In Proceedings of the Eleventh Text Retrieval Conference (TREC), page 721.
[8]
Christof Monz, Jaap Kamps, and Maarten de Rijke. 2002. The University of Amsterdam at TREC 2002. In Proceedings of the Eleventh Text Retrieval Conference (TREC).
[9]
Vanessa Murdock and W. Bruce Croft. 2004. Simple translation models for sentence retrieval in factoid question answering. In Proceedings of the Information Retrieval for Question Answering Workshop at SIGIR 2004.
[10]
Ryosuke Ohgaya, Akiyoshi Shimmura, and Tomohiro Takagi. 2003. Meiji University web and novelty track experiments at TREC 2003. In Proceedings of the Twelth Text Retrieval Conference (TREC).
[11]
Jeffrey C. Reynar and Adwait Ratnaparkhi. 1997. A maximum entropy approach to identifying sentence boundaries. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP). http://www.cis.upenn.edu/~adwait/statnlp.html.
[12]
Mark Smucker and James Allan. 2005. An investigation of dirichlet prior smoothing's performance advantage. Technical Report IR-391, The University of Massachusetts, The Center for Intelligent Information Retrieval.
[13]
Ian Soboroff and Donna Harman. 2003. Overview of the TREC 2003 novelty track. In Proceedings of the Twelfth Text Retrieval Conference (TREC).
[14]
Ian Soboroff. 2004. Overview of the TREC 2004 novelty track. In Proceedings of the Thirteenth Text Retrieval Conference (TREC). forthcoming.
[15]
Jinxi Xu, Alexander Fraser, and Ralph Weischedel. 2002. Empirical studies in strategies for arabic retrieval. In Proceedings of the 25th Annual Conference on Research and Development in Information Retrieval (ACM SIGIR).
[16]
ChengXiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 334--342.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
October 2005
1054 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 October 2005

Qualifiers

  • Article

Acceptance Rates

HLT '05 Paper Acceptance Rate 127 of 402 submissions, 32%;
Overall Acceptance Rate 240 of 768 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)7
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media