skip to main content
research-article

Query modeling for entity search based on terms, categories, and examples

Published: 08 December 2011 Publication History

Abstract

Users often search for entities instead of documents, and in this setting, are willing to provide extra input, in addition to a series of query terms, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insights in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.

References

[1]
Balog, K. 2008. People search in the enterprise. Ph.D. thesis, University of Amsterdam.
[2]
Balog, K., Azzopardi, L., and de Rijke, M. 2006. Formal models for expert finding in enterprise corpora. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, USA, 43--50.
[3]
Balog, K., Azzopardi, L., and de Rijke, M. 2009. A language modeling framework for expert finding. Inform. Process. Manag. 45, 1, 1--19.
[4]
Balog, K., Bron, M., and de Rijke, M. 2010. Category-based query modeling for entity search. In Proceedings of the 32nd European Conference on Information Retrieval (ECIR). Springer, Berlin, 319--331.
[5]
Balog, K. and de Rijke, M. 2008. Associating people and documents. In Proceedings of the 30th European Conference on Information Retrieval (ECIR). Springer, Berlin, 296--308.
[6]
Balog, K., de Vries, A. P., Serdyukov, P., Thomas, P., and Westerveld, T. 2010. Overview of the TREC 2009 entity track. In Proceedings of the 18th Text REtrieval Conference (TREC). NIST.
[7]
Balog, K., Soboroff, I., Thomas, P., Craswell, N., de Vries, A. P., and Bailey, P. 2009. Overview of the TREC 2008 enterprise track. In Proceedings of the 17th Text Retrieval Conference Proceedings (TREC). NIST.
[8]
Balog, K., Weerkamp, W., and de Rijke, M. 2008. A few examples go a long way: constructing query models from elaborate query formulations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, New York, NY, 371--378.
[9]
Chu-Carroll, J., Czuba, K., Prager, J., Ittycheriah, A., and Blair-Goldensohn, S. 2004. IBM's PIQUANT II in TREC 2004. In Proceedings of the 13th Text Retrieval Conference (TREC). NIST.
[10]
Conrad, J. G. and Utt, M. H. 1994. A system for discovering relationships by feature extraction from text databases. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Springer Verlag, Berlin, 260--270.
[11]
Craswell, N., Demartini, G., Gaugaz, J., and Iofciu, T. 2009. L3S at INEX2008: Retrieving entities using structured information. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 253--263.
[12]
de Vries, A., Vercoustre, A.-M., Thom, J. A., Craswell, N., and Lalmas, M. 2008. Overview of the INEX 2007 entity ranking track. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 245--251.
[13]
Demartini, G., de Vries, A., Iofciu, T., and Zhu, J. 2009. Overview of the INEX 2008 entity ranking track. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 243--252.
[14]
Demartini, G., Firan, C. S., and Iofciu, T. 2008. L3S at INEX 2007: Query expansion for entity ranking using a highly accurate ontology. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 252--263.
[15]
Fissaha Adafre, S., de Rijke, M., and Tjong Kim Sang, E. 2007. Entity retrieval. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP).
[16]
Fuhr, N., Kamps, J., Lalmas, M., and Trotman, A., Eds. 2008. Focused Access to XML documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). Lecture Notes in Computer Science, vol. 4862. Springer Verlag, Berlin.
[17]
Geva, S., Kamps, J., and Trotman, A., Eds. 2009. Advances in Focused Retrieval: 7th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). Lecture Notes in Computer Science, vol. 5631. Springer-Verlag, Berlin.
[18]
Ghahramani, Z. and Heller, K. 2006. Bayesian sets. In Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, Eds. MIT Press, Cambridge, MA, 435--442.
[19]
GoogleSets. 2009. http://labs.google.com/sets (accessed 1/09.)
[20]
Huurnink, B., Hollink, L., van den Heuvel, W., and de Rijke, M. 2010. Search behavior of media professionals at an audiovisual archive: A transaction log analysis. J. Amer. Soc. Infor. Sci. Technol. 61, 6, 1180--1197.
[21]
Jämsen, J., Näppilä, T., and Arvola, P. 2008. Entity ranking based on category expansion. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 264--278.
[22]
Järvelin, K., Kekäläinen, J., and Niemi, T. 2001. Expansiontool: Concept-based query expansion and construction. Infor. Retrieval 4, 3-4, 231--255.
[23]
Jiang, J., Liu, W., Rong, X., and Gao, Y. 2009. Adapting language modeling methods for expert search to rank Wikipedia entities. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 264--272.
[24]
Kamps, J. and Koolen, M. 2008. The importance of link evidence in Wikipedia. In Proceedings of the 30th European Conference on Information Retrieval (ECIR). Springer, 270--282.
[25]
Kamps, J., Marx, M., de Rijke, M., and Sigurbjörnsson, B. 2006. Articulating information needs in XML query languages. ACM Trans. Inf. Syst. 24, 4, 407--436.
[26]
Kaptein, R. and Kamps, J. 2009. Finding entities in Wikipedia using links and categories. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 273--279.
[27]
Kim, J., Xue, X., and Croft, W. B. 2009. A probabilistic retrieval model for semistructured data. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval. Springer-Verlag, Berlin, 228--239.
[28]
Kraaij, W., Westerveld, T., and Hiemstra, D. 2002. The importance of prior probabilities for entry page search. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 27--34.
[29]
Lafferty, J. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 111--119.
[30]
Lavrenko, V. and Croft, W. B. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 120--127.
[31]
Losada, D. and Azzopardi, L. 2008. An analysis on document length retrieval trends in language modeling smoothing. Infor. Retrieval 11, 2, 109--138.
[32]
Meij, E., Bron, M., Huurnink, B., Hollink, L., and de Rijke, M. 2009. Learning semantic query suggestions. In Proceedings of the 8th International Semantic Web Conference (ISWC). Springer, Berlin.
[33]
Meij, E. and de Rijke, M. 2007. Thesaurus-based feedback to support mixed search and browsing environments. In Proceedings of the 11th European Conference on Research and Advanced Technology for Digital Libraries. Springer, Berlin.
[34]
Metzler, D. and Croft, W. B. 2005. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 472--479.
[35]
Mishne, G. and de Rijke, M. 2005. Boosting web retrieval through query operations. In Proceedings of 27th European Conference on IR Research (ECIR). D. Losada and J. Fernández-Luna, Eds. Springer, Berlin, 502--516.
[36]
Mishne, G. and de Rijke, M. 2006. A study of blog search. In Proceedings of the 28th European Conference on IR Research (ECIR). M. Lalmas, A. MacFarlane, S. Rüger, A. Tombros, T. Tsikrika, and A. Yavlinsky, Eds. LNCS Series, vol. 3936. Springer, Berlin, 289--301.
[37]
Murugeshan, M. S. and Mukherjee, S. 2008. An n-gram and initial description based approach for entity ranking track. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 293--305.
[38]
Pehcevski, J., Vercoustre, A.-M., and Thom, J. A. 2008. Exploiting locality of Wikipedia links in entity ranking. In Proceedings of the 30th European Conference on Information Retrieval (ECIR). Springer, Berlin, 258--269.
[39]
Petkova, D. and Croft, W. B. 2007. Proximity-based document representation for named entity retrieval. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM, New York, NY, 731--740.
[40]
Raghavan, H., Allan, J., and Mccallum, A. 2004. An exploration of entity models, collective classification and relation description. In Proceedings of the ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD). ACM, New York, NY.
[41]
Rose, D. E. and Levinson, D. 2004. Understanding user goals in web search. In Proceedings of the 13th International Conference on World Wide Web. ACM, New York, NY, 13--19.
[42]
Sayyadian, M., Shakery, A., Doan, A., and Zhai, C. 2004. Toward entity retrieval over structured and text data. In Proceedings of the ACM SIGIR Workshop on the Integration of Information Retrieval and Databases (WIRD). ACM, New York, NY.
[43]
Serdyukov, P. and Hiemstra, D. 2008. Being omnipresent to be almighty: The importance of the global web evidence for organizational expert finding. In Proceedings of the SIGIR Workshop on Future Challenges in Expertise Retrieval (fCHER). ACM, New York, NY, 17--24.
[44]
Song, F. and Croft, W. B. 1999. A general language model for information retrieval. In Proceedings of the 18th International Conference on Information and Knowledge Management. ACM, New York, NY, 316--321.
[45]
Suchanek, F. M., Kasneci, G., and Weikum, G. 2007. YAGO: a core of semantic knowledge unifying WordNet and Wikipedia. In Proceedings of the 16th International World Wide Web Conference. 697--706.
[46]
Tao, T. and Zhai, C. 2006. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 162--169.
[47]
Thom, J., Pehcevski, J., and Vercoustre, A.-M. 2007. Use of Wikipedia categories in entity ranking. In Proceedings of the 12th Australasian Document Computing Symposium (ADCS).
[48]
Tsikrika, T., Serdyukov, P., Rode, H., Westerveld, T., Aly, R., Hiemstra, D., and de Vries, A. P. 2008. Structured document retrieval, multimedia retrieval, and entity ranking using PF/Tijah. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 306--320.
[49]
Vercoustre, A.-M., Pehcevski, J., and Naumovski, V. 2009. Topic difficulty prediction in entity ranking. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 280--291.
[50]
Vercoustre, A.-M., Pehcevski, J., and Thom, J. A. 2008. Using Wikipedia categories and links in entity ranking. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 321--335.
[51]
Vercoustre, A.-M., Thom, J., and Pehcevski, J. 2007. Entity ranking in Wikipedia. Res. rep. RR-6294, INRIA.
[52]
Vercoustre, A.-M., Thom, J. A., and Pehcevski, J. 2008. Entity ranking in Wikipedia. In Proceedings of the ACM Symposium on Applied Computing (SAC). ACM, New York, NY, 1101--1106.
[53]
Voorhees, E. 2005. Overview of the TREC 2004 question answering track. In Proceedings of the 13th Text Retrieval Conference (TREC). NIST, Special Publication SP 500-261.
[54]
Weerkamp, W., Balog, K., and Meij, E. 2009. A generative language modeling approach for ranking entities. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 292--299.
[55]
Yilmaz, E., Kanoulas, E., and Aslam, J. A. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 603--610.
[56]
Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., and Attardi, G. 2007. Ranking very many typed entities on Wikipedia. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. ACM, New York, NY, 1015--1018.
[57]
Zhai, C. and Lafferty, J. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Infor. Syst. 22, 2, 179--214.
[58]
Zhu, J., Huang, X., Song, D., and Rüger, S. 2009. Integrating multiple document features in language models for expert finding. In Knowledge and Information Systems. DOI 10.1007/s10115-009-0202-6.
[59]
Zhu, J., Song, D., and Rüger, S. 2008. Integrating document features for entity ranking. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 336--347.
[60]
Zhu, J., Song, D., Rüger, S. M., Eisenstadt, M., and Motta, E. 2006. The Open University at TREC 2006 Enterprise Track Expert Search Task. In Proceedings of the 15th Text REtrieval Conference (TREC). NIST. Special Publication 500-272.

Cited By

View all

Index Terms

  1. Query modeling for entity search based on terms, categories, and examples

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 29, Issue 4
        December 2011
        172 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/2037661
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 December 2011
        Accepted: 01 September 2011
        Revised: 01 October 2010
        Received: 01 April 2010
        Published in TOIS Volume 29, Issue 4

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Entity retrieval
        2. generative probabilistic model
        3. query expansion
        4. query modeling

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)14
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 17 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media