skip to main content
article

A semi-supervised incremental algorithm to automatically formulate topical queries

Published: 01 May 2009 Publication History

Abstract

The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user's information needs and the relevant documents' vocabulary. The learning strategy uses an incrementally-retrieved, topic-dependent selection of Web documents for term-weight reinforcement reflecting the aptness of the terms in describing and discriminating the topic of the user context. The new algorithm learns new descriptors by searching for terms that tend to occur often in relevant documents, and learns good discriminators by identifying terms that tend to occur only in the context of the given topic. The enriched vocabulary allows the formulation of search queries that are more effective than those queries generated directly using terms from the initial topic description. An evaluation on a large collection of topics using a standard and two ad-hoc performance evaluation metrics suggests that the proposed technique is superior to a baseline and other existing query reformulation techniques.

References

[1]
Giambattista Amanti, Probabilistics Models for Information Retrieval based on Divergence from Randomness. PhD Thesis, Department of Computing Science, University of Glasgow, UK, 2003.
[2]
Amati, Giambattista, Carpineto, Claudio and Romano, Giovanni, Query difficulty, robustness and selective application of query expansion. In: Advances in Information Retrieval, 26th European Conference on IR Research, Springer, Berlin, Heidelberg. pp. 127-137.
[3]
Baeza-Yates, Ricardo and Ribeiro-Neto, Berthier, Modern Information Retrieval. 1999. Addison-Wesley.
[4]
Belkin, Nicolas J., Helping people find what they don't know. Commun ACM. v43 i8. 58-61.
[5]
Billerbeck, Bodo, Scholer, Falk, Williams, Hugh E. and Zobel, Justin, Query expansion using associated queries. In: Proceedings of the 12th International Conference on Information and Knowledge Management, ACM Press. pp. 2-9.
[6]
Chris Buckley, Amit Singhal, Mandar Mitra, New retrieval approaches using smart, in: TREC, vol. 4, 1995.
[7]
Budzik, Jay, Hammond, Kristian J. and Birnbaum, Larry, Information access in context. Knowledge Based Systems. v14 i1-2. 37-53.
[8]
Cai, D. and van Rijsbergen, C.J., Learning semantic relatedness from term discrimination information. Expert Systems with Applications. v36 i2. 860-1875.
[9]
Cao, Guihong, Nie, Jian-Yun, Gao, Jianfeng and Robertson, Stephen, Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA. pp. 243-250.
[10]
Soumen Chakrabarti, Martin van den Berg, and Byron Dom, Focused crawling: a new approach to topic-specific Web resource discovery, Computer Networks (Amsterdam, Netherlands: 1999) 31 (11-16) (1999,1999a), pp. 1623-1640.
[11]
Deerwester, Scott C., Dumais, Susan T., Landauer, Thomas K., Furnas, George W. and Harshman, Richard A., Indexing by latent semantic analysis. Journal of the American Society of Information Science. v41 i6. 391-407.
[12]
Holland, John H., Adaptation in Natural and Artificial Systems. 1975. The University of Michigan Press, Ann Arbor.
[13]
Jones, Sparck K., A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation. v28. 11-21.
[14]
Chris Jordan, Carolyn R. Watters, Extending the rocchio relevance feedback algorithm to provide contextual retrieval, in: AWIC, 2004, pp. 135-144.
[15]
Kautz, Henry, Selman, Bart and Shah, Mehul, The hidden Web. AI Magazine. v18 i2. 27-36.
[16]
Kobayashi, Mei and Takeda, Koichi, Information retrieval on the Web. ACM Computing Surveys. v32 i2. 144-173.
[17]
Kraft, Reiner, Chang, Chi Chao, Maghoul, Farzin and Kumar, Ravi, Searching with context. In: WWW'06: Proceedings of the 15th International Conference on World Wide Web, ACM, New York, NY, USA. pp. 477-486.
[18]
Kwok, K.L. and Chan, M., Improving two-stage ad-hoc retrieval for short queries. In: SIGIR '98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA. pp. 250-256.
[19]
Leake, David, Maguitman, Ana, Reichherzer, Thomas, Cañas, Alberto, Carvalho, Marco, Arguedas, Marco, Brenes, Sofia and Eskridge, Tom, Aiding knowledge capture by searching for extensions of knowledge models. In: Proceedings of KCAP-2003, ACM Press.
[20]
Leake, David B., Bauer, Travis, Maguitman, Ana and Wilson, David C., Capture, storage and reuse of lessons about information resources: supporting task-based information search. In: Proceedings of the AAAI-00 Workshop on Intelligent Lessons Learned Systems, AAAI Press, Austin, Texas. pp. 33-37.
[21]
Maguitman, Ana, Leake, David and Reichherzer, Thomas, Suggesting novel but related topics: towards context-based support for knowledge model extension. In: IUI'05: Proceedings of the 10th International Conference on Intelligent User Interfaces, ACM Press, New York, NY, USA. pp. 207-214.
[22]
Ana Maguitman, David Leake, Thomas Reichherzer, Filippo Menczer, Dynamic extraction of topic descriptors and discriminators: towards automatic context-based topic search, in: Proceedings of the 13th Conference on Information and Knowledge Management (CIKM), ACM Press, Washington, DC, November 2004.
[23]
Maguitman, Ana G., Menczer, Filippo, Roinestad, Heather and Vespignani, Alessandro, Algorithmic detection of semantic similarity. In: WWW'05: Proceedings of the 14th International Conference on World Wide Web, ACM, New York, NY, USA. pp. 107-116.
[24]
Menczer, Filippo, Pant, Gautam and Srinivasan, Padmini, Topical web crawlers: evaluating adaptive algorithms. ACM Transactions on Internet Technology (TOIT),. v4 i4. 378-419.
[25]
Iadh Ounis, Christina Lioma, Craig Macdonald, Vassilis Plachouras, Research directions in Terrier: a search engine for advanced retrieval on the web, in: Ricardo Baeza-Yates et al. (Eds.), Novatica/UPGRADE Special Issue on Web Information Access, Invited Paper, vol. VIII (1), February 2007, pp. 49-56.
[26]
Eduardo H. Ramirez, Ramon F. Brena, Semantic contexts in the internet, in: LA-WEB'06: Proceedings of the Fourth Latin American Web Congress, IEEE Computer Society, Washington, DC, USA, 2006, pp. 74-81.
[27]
Rennie, Jason D.M. and Jaakkola, Tommi, Using term informativeness for named entity detection. In: SIGIR'05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA. pp. 353-360.
[28]
Rocchio, J.J., Relevance feedback in information retrieval. In: Salton, G. (Ed.), The Smart Retrieval System - Experiments in Automatic Document Processing, Prentice-Hall, Englewood Cliffs, NJ. pp. 313-323.
[29]
Salton, G. and Buckley, C., Term weighting approaches in automatic text retrieval. Information Processing and Management. v24 i5. 513-523.
[30]
Salton, G. and Yang, C., On the specification of term values in automatic indexing. Journal of Documentation. v29. 351-372.
[31]
Scholer, Falk and Williams, Hugh E., Query association for effective retrieval. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, ACM Press. pp. 324-331.
[32]
Sufyan Beg, M.M. and Ahmad, Nesar, Web search enhancement by mining user actions. Information Sciences. v177 i23. 5203-5218.
[33]
Turney, Peter D., Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: EMCL'01: Proceedings of the 12th European Conference on Machine Learning, Springer-Verlag, London, UK. pp. 491-502.
[34]
Xu, Zuobing and Akella, Ram, Active relevance feedback for difficult queries. In: CIKM'08: Proceeding of the 17th ACM Conference on Information and Knowledge Management, ACM, New York, NY, USA. pp. 459-468.

Cited By

View all
  1. A semi-supervised incremental algorithm to automatically formulate topical queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Information Sciences: an International Journal
    Information Sciences: an International Journal  Volume 179, Issue 12
    May, 2009
    258 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 01 May 2009

    Author Tags

    1. Context
    2. Query formulation
    3. Topical queries
    4. Web search

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media