research-article

Query modeling for entity search based on terms, categories, and examples

Authors:

Krisztian Balog,

Maarten De RijkeAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 29, Issue 4

Article No.: 22, Pages 1 - 31

https://doi.org/10.1145/2037661.2037667

Published: 08 December 2011 Publication History

Abstract

Users often search for entities instead of documents, and in this setting, are willing to provide extra input, in addition to a series of query terms, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insights in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.

References

[1]

Balog, K. 2008. People search in the enterprise. Ph.D. thesis, University of Amsterdam.

[2]

Balog, K., Azzopardi, L., and de Rijke, M. 2006. Formal models for expert finding in enterprise corpora. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, USA, 43--50.

Digital Library

[3]

Balog, K., Azzopardi, L., and de Rijke, M. 2009. A language modeling framework for expert finding. Inform. Process. Manag. 45, 1, 1--19.

Digital Library

[4]

Balog, K., Bron, M., and de Rijke, M. 2010. Category-based query modeling for entity search. In Proceedings of the 32nd European Conference on Information Retrieval (ECIR). Springer, Berlin, 319--331.

Digital Library

[5]

Balog, K. and de Rijke, M. 2008. Associating people and documents. In Proceedings of the 30th European Conference on Information Retrieval (ECIR). Springer, Berlin, 296--308.

Digital Library

[6]

Balog, K., de Vries, A. P., Serdyukov, P., Thomas, P., and Westerveld, T. 2010. Overview of the TREC 2009 entity track. In Proceedings of the 18th Text REtrieval Conference (TREC). NIST.

[7]

Balog, K., Soboroff, I., Thomas, P., Craswell, N., de Vries, A. P., and Bailey, P. 2009. Overview of the TREC 2008 enterprise track. In Proceedings of the 17th Text Retrieval Conference Proceedings (TREC). NIST.

[8]

Balog, K., Weerkamp, W., and de Rijke, M. 2008. A few examples go a long way: constructing query models from elaborate query formulations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, New York, NY, 371--378.

Digital Library

[9]

Chu-Carroll, J., Czuba, K., Prager, J., Ittycheriah, A., and Blair-Goldensohn, S. 2004. IBM's PIQUANT II in TREC 2004. In Proceedings of the 13th Text Retrieval Conference (TREC). NIST.

[10]

Conrad, J. G. and Utt, M. H. 1994. A system for discovering relationships by feature extraction from text databases. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Springer Verlag, Berlin, 260--270.

Digital Library

[11]

Craswell, N., Demartini, G., Gaugaz, J., and Iofciu, T. 2009. L3S at INEX2008: Retrieving entities using structured information. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 253--263.

Digital Library

[12]

de Vries, A., Vercoustre, A.-M., Thom, J. A., Craswell, N., and Lalmas, M. 2008. Overview of the INEX 2007 entity ranking track. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 245--251.

Digital Library

[13]

Demartini, G., de Vries, A., Iofciu, T., and Zhu, J. 2009. Overview of the INEX 2008 entity ranking track. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 243--252.

Digital Library

[14]

Demartini, G., Firan, C. S., and Iofciu, T. 2008. L3S at INEX 2007: Query expansion for entity ranking using a highly accurate ontology. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 252--263.

Digital Library

[15]

Fissaha Adafre, S., de Rijke, M., and Tjong Kim Sang, E. 2007. Entity retrieval. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP).

[16]

Fuhr, N., Kamps, J., Lalmas, M., and Trotman, A., Eds. 2008. Focused Access to XML documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). Lecture Notes in Computer Science, vol. 4862. Springer Verlag, Berlin.

Digital Library

[17]

Geva, S., Kamps, J., and Trotman, A., Eds. 2009. Advances in Focused Retrieval: 7th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). Lecture Notes in Computer Science, vol. 5631. Springer-Verlag, Berlin.

Digital Library

[18]

Ghahramani, Z. and Heller, K. 2006. Bayesian sets. In Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, Eds. MIT Press, Cambridge, MA, 435--442.

[19]

GoogleSets. 2009. http://labs.google.com/sets (accessed 1/09.)

[20]

Huurnink, B., Hollink, L., van den Heuvel, W., and de Rijke, M. 2010. Search behavior of media professionals at an audiovisual archive: A transaction log analysis. J. Amer. Soc. Infor. Sci. Technol. 61, 6, 1180--1197.

Digital Library

[21]

Jämsen, J., Näppilä, T., and Arvola, P. 2008. Entity ranking based on category expansion. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 264--278.

[22]

Järvelin, K., Kekäläinen, J., and Niemi, T. 2001. Expansiontool: Concept-based query expansion and construction. Infor. Retrieval 4, 3-4, 231--255.

Digital Library

[23]

Jiang, J., Liu, W., Rong, X., and Gao, Y. 2009. Adapting language modeling methods for expert search to rank Wikipedia entities. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 264--272.

Digital Library

[24]

Kamps, J. and Koolen, M. 2008. The importance of link evidence in Wikipedia. In Proceedings of the 30th European Conference on Information Retrieval (ECIR). Springer, 270--282.

Digital Library

[25]

Kamps, J., Marx, M., de Rijke, M., and Sigurbjörnsson, B. 2006. Articulating information needs in XML query languages. ACM Trans. Inf. Syst. 24, 4, 407--436.

Digital Library

[26]

Kaptein, R. and Kamps, J. 2009. Finding entities in Wikipedia using links and categories. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 273--279.

Digital Library

[27]

Kim, J., Xue, X., and Croft, W. B. 2009. A probabilistic retrieval model for semistructured data. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval. Springer-Verlag, Berlin, 228--239.

Digital Library

[28]

Kraaij, W., Westerveld, T., and Hiemstra, D. 2002. The importance of prior probabilities for entry page search. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 27--34.

Digital Library

[29]

Lafferty, J. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 111--119.

Digital Library

[30]

Lavrenko, V. and Croft, W. B. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 120--127.

Digital Library

[31]

Losada, D. and Azzopardi, L. 2008. An analysis on document length retrieval trends in language modeling smoothing. Infor. Retrieval 11, 2, 109--138.

Digital Library

[32]

Meij, E., Bron, M., Huurnink, B., Hollink, L., and de Rijke, M. 2009. Learning semantic query suggestions. In Proceedings of the 8th International Semantic Web Conference (ISWC). Springer, Berlin.

Digital Library

[33]

Meij, E. and de Rijke, M. 2007. Thesaurus-based feedback to support mixed search and browsing environments. In Proceedings of the 11th European Conference on Research and Advanced Technology for Digital Libraries. Springer, Berlin.

Digital Library

[34]

Metzler, D. and Croft, W. B. 2005. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 472--479.

Digital Library

[35]

Mishne, G. and de Rijke, M. 2005. Boosting web retrieval through query operations. In Proceedings of 27th European Conference on IR Research (ECIR). D. Losada and J. Fernández-Luna, Eds. Springer, Berlin, 502--516.

Digital Library

[36]

Mishne, G. and de Rijke, M. 2006. A study of blog search. In Proceedings of the 28th European Conference on IR Research (ECIR). M. Lalmas, A. MacFarlane, S. Rüger, A. Tombros, T. Tsikrika, and A. Yavlinsky, Eds. LNCS Series, vol. 3936. Springer, Berlin, 289--301.

Digital Library

[37]

Murugeshan, M. S. and Mukherjee, S. 2008. An n-gram and initial description based approach for entity ranking track. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 293--305.

Digital Library

[38]

Pehcevski, J., Vercoustre, A.-M., and Thom, J. A. 2008. Exploiting locality of Wikipedia links in entity ranking. In Proceedings of the 30th European Conference on Information Retrieval (ECIR). Springer, Berlin, 258--269.

Digital Library

[39]

Petkova, D. and Croft, W. B. 2007. Proximity-based document representation for named entity retrieval. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM, New York, NY, 731--740.

Digital Library

[40]

Raghavan, H., Allan, J., and Mccallum, A. 2004. An exploration of entity models, collective classification and relation description. In Proceedings of the ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD). ACM, New York, NY.

[41]

Rose, D. E. and Levinson, D. 2004. Understanding user goals in web search. In Proceedings of the 13th International Conference on World Wide Web. ACM, New York, NY, 13--19.

Digital Library

[42]

Sayyadian, M., Shakery, A., Doan, A., and Zhai, C. 2004. Toward entity retrieval over structured and text data. In Proceedings of the ACM SIGIR Workshop on the Integration of Information Retrieval and Databases (WIRD). ACM, New York, NY.

[43]

Serdyukov, P. and Hiemstra, D. 2008. Being omnipresent to be almighty: The importance of the global web evidence for organizational expert finding. In Proceedings of the SIGIR Workshop on Future Challenges in Expertise Retrieval (fCHER). ACM, New York, NY, 17--24.

[44]

Song, F. and Croft, W. B. 1999. A general language model for information retrieval. In Proceedings of the 18th International Conference on Information and Knowledge Management. ACM, New York, NY, 316--321.

Digital Library

[45]

Suchanek, F. M., Kasneci, G., and Weikum, G. 2007. YAGO: a core of semantic knowledge unifying WordNet and Wikipedia. In Proceedings of the 16th International World Wide Web Conference. 697--706.

Digital Library

[46]

Tao, T. and Zhai, C. 2006. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 162--169.

Digital Library

[47]

Thom, J., Pehcevski, J., and Vercoustre, A.-M. 2007. Use of Wikipedia categories in entity ranking. In Proceedings of the 12th Australasian Document Computing Symposium (ADCS).

[48]

Tsikrika, T., Serdyukov, P., Rode, H., Westerveld, T., Aly, R., Hiemstra, D., and de Vries, A. P. 2008. Structured document retrieval, multimedia retrieval, and entity ranking using PF/Tijah. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 306--320.

Digital Library

[49]

Vercoustre, A.-M., Pehcevski, J., and Naumovski, V. 2009. Topic difficulty prediction in entity ranking. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 280--291.

Digital Library

[50]

Vercoustre, A.-M., Pehcevski, J., and Thom, J. A. 2008. Using Wikipedia categories and links in entity ranking. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 321--335.

Digital Library

[51]

Vercoustre, A.-M., Thom, J., and Pehcevski, J. 2007. Entity ranking in Wikipedia. Res. rep. RR-6294, INRIA.

[52]

Vercoustre, A.-M., Thom, J. A., and Pehcevski, J. 2008. Entity ranking in Wikipedia. In Proceedings of the ACM Symposium on Applied Computing (SAC). ACM, New York, NY, 1101--1106.

Digital Library

[53]

Voorhees, E. 2005. Overview of the TREC 2004 question answering track. In Proceedings of the 13th Text Retrieval Conference (TREC). NIST, Special Publication SP 500-261.

[54]

Weerkamp, W., Balog, K., and Meij, E. 2009. A generative language modeling approach for ranking entities. Lecture Notes in Computer Science, vol. 5631, Springer-Verlag, Berlin, 292--299.

Digital Library

[55]

Yilmaz, E., Kanoulas, E., and Aslam, J. A. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 603--610.

Digital Library

[56]

Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., and Attardi, G. 2007. Ranking very many typed entities on Wikipedia. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. ACM, New York, NY, 1015--1018.

Digital Library

[57]

Zhai, C. and Lafferty, J. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Infor. Syst. 22, 2, 179--214.

Digital Library

[58]

Zhu, J., Huang, X., Song, D., and Rüger, S. 2009. Integrating multiple document features in language models for expert finding. In Knowledge and Information Systems. DOI 10.1007/s10115-009-0202-6.

Digital Library

[59]

Zhu, J., Song, D., and Rüger, S. 2008. Integrating document features for entity ranking. Lecture Notes in Computer Science, vol. 4862, Springer-Verlag, Berlin, 336--347.

Digital Library

[60]

Zhu, J., Song, D., Rüger, S. M., Eisenstadt, M., and Motta, E. 2006. The Open University at TREC 2006 Enterprise Track Expert Search Task. In Proceedings of the 15th Text REtrieval Conference (TREC). NIST. Special Publication 500-272.

Cited By

Jafarzadeh PAmirmahani ZEnsan F(2024)Learning contextual representations for entity retrievalApplied Intelligence10.1007/s10489-024-05430-054:19(8820-8840)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10489-024-05430-0
Garigliotti D(2024)Entity Examples for Explainable Query Target Type Identification with LLMsIntelligent Data Engineering and Automated Learning – IDEAL 202410.1007/978-3-031-77738-7_21(253-259)Online publication date: 19-Nov-2024
https://dl.acm.org/doi/10.1007/978-3-031-77738-7_21
Chatterjee SMackie IDalton J(2024)DREQ: Document Re-ranking Using Entity-Based Query UnderstandingAdvances in Information Retrieval10.1007/978-3-031-56027-9_13(210-229)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56027-9_13
Show More Cited By

Index Terms

Query modeling for entity search based on terms, categories, and examples
1. Information systems

Recommendations

Entity-aware Transformers for Entity Search
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pre-trained language models such as BERT have been a key ingredient to achieve state-of-the-art results on a variety of tasks in natural language processing and, more recently, also in information retrieval. Recent research even claims that BERT is able ...
Entity centric query expansion for enterprise search
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Many information needs of enterprise search center around entities. Intuitively, information related to the entities mentioned in the query, ...
Exploiting entity relationship for query expansion in enterprise search
Abstract
Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Enterprise data contain both structured and unstructured information. Since these two types of information are complementary and the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 29, Issue 4

December 2011

172 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/2037661

Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2011

Accepted: 01 September 2011

Revised: 01 October 2010

Received: 01 April 2010

Published in TOIS Volume 29, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

66
Total Citations
View Citations
718
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jafarzadeh PAmirmahani ZEnsan F(2024)Learning contextual representations for entity retrievalApplied Intelligence10.1007/s10489-024-05430-054:19(8820-8840)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10489-024-05430-0
Garigliotti D(2024)Entity Examples for Explainable Query Target Type Identification with LLMsIntelligent Data Engineering and Automated Learning – IDEAL 202410.1007/978-3-031-77738-7_21(253-259)Online publication date: 19-Nov-2024
https://dl.acm.org/doi/10.1007/978-3-031-77738-7_21
Chatterjee SMackie IDalton J(2024)DREQ: Document Re-ranking Using Entity-Based Query UnderstandingAdvances in Information Retrieval10.1007/978-3-031-56027-9_13(210-229)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56027-9_13
Guo MZhou ZGotz DWang Y(2023)GRAFS: Graphical Faceted Search System to Support Conceptual Understanding in Exploratory SearchACM Transactions on Interactive Intelligent Systems10.1145/358831913:2(1-36)Online publication date: 31-Mar-2023
https://dl.acm.org/doi/10.1145/3588319
Ma DChen-Chuan Chang KChen YLv XShen LFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)A Principled Decomposition of Pointwise Mutual Information for Intention Template DiscoveryProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614767(1746-1755)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614767
Oza PDietz L(2023)Entity Embeddings for Entity Ranking: A Replicability StudyAdvances in Information Retrieval10.1007/978-3-031-28241-6_8(117-131)Online publication date: 2-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-28241-6_8
Chatterjee SDietz LAl Hasan MXiong L(2022)Predicting Guiding Entities for Entity Aspect LinkingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557671(3848-3852)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557671
Chatterjee SDietz LAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)BERT-ERProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531944(1466-1477)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531944
Chatterjee S(2022)An Entity-Oriented Approach for Answering Topical Information NeedsAdvances in Information Retrieval10.1007/978-3-030-99739-7_57(463-472)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99739-7_57
Chatterjee SDietz LDiaz FShah CSuel TCastells PJones RSakai T(2021)Entity Retrieval Using Fine-Grained Entity AspectsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463035(1662-1666)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3463035
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents