skip to main content
article

Query enrichment for web-query classification

Published: 01 July 2006 Publication History

Abstract

Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps it to intermediate objects. Based on the collected intermediate objects, the query is then mapped to target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to the ACM Knowledge Discovery and Data Mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this article, we show that, despite the difficulty of an abundance of ambiguous queries and lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result for F1 and precision is 42.4% and 44.4%, respectively, which is 9.6% and 24.3% higher than those from the runner-ups, respectively.

References

[1]
Alpaydin, E. 2004. Introduction to Machine Learning. MIT Press, Cambridge, MA.]]
[2]
Bauer, E. and Kohavi, R. 1999. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. J. Mach. Learn. 36, 1--2, 105--139.]]
[3]
Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In KDD '00: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 407--416.]]
[4]
Beitzel, S. M., Jensen, E. C., Frieder, O., Grossman, D., Lewis, D. D., Chowdhury, A., and Kolcz, A. 2005. Automatic web query classification using labeled and unlabeled training data. In SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 581--582.]]
[5]
Cann, A. J. 2003. Maths from Scratch for Biologists. John Wiley & Sons, New York, NY.]]
[6]
Caruana, R., Niculescu-Mizil, A., Crew, G., and Ksikes, A. 2004. Ensemble selection from libraries of models. In ICML '04: Proceedings of the 21st International Conference on Machine Learning. ACM Press, New York, 18.]]
[7]
Chang, C.-H. and Hsu, C.-C. 1998. Integrating query expansion and conceptual relevance feedback for personalized web information retrieval. In WWW7: Proceedings of the 7th International Conference on World Wide Web 7. Elsevier Science Publishers B. V., Amsterdam, 621--623.]]
[8]
Chekuri, C., Goldwasser, M., Raghavan, P., and Upfal, E. 1997. Web search using automated classification. 6th International World Wide Web Conference (WWW6). Poster presentation.]]
[9]
Chen, H. and Dumais, S. 2000. Bringing order to the web: Automatically categorizing search results. In CHI '00: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM Press, New York, 145--152.]]
[10]
Dietterich, T. G. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems. 1--15.]]
[11]
Fan, W., Stolfo, S. J., and Zhang, J. 1999. The application of adaboost for distributed, scalable and on-line learning. In KDD '99: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 362--366.]]
[12]
Hansen, L. K. and Salamon, P. 1990. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, 10, 993--1001.]]
[13]
HITEC. http://categorizer.tmit.bmr.hu.]]
[14]
Hoel, P. G. 1966. Elementary Statistics, 2nd ed. Wiley, New York, NY.]]
[15]
Howe, A. E. and Dreilinger, D. 1997. SAVVYSEARCH: A metasearch engine that learns which search engines to query. AI Mag. 18, 2, 19--25.]]
[16]
Jansen, B. J. 2000. The effect of query complexity on web searching results. Inf. Res. 6, 1.]]
[17]
Joachims, T. 1998. Text categorization with suport vector machines: Learning with many relevant features. In ECML'98 Proceedings of the 10th European Conference on Machine Learning, 137--142.]]
[18]
Joachims, T. 1999. Transductive inference for text classification using support vector machines. In ICML '99: Proceedings of the 16th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, 200--209.]]
[19]
Jones, K. S. 1971. Automatic Keyword Classifications for Information Retrieval. Butterworth, London, UK.]]
[20]
Kang, I.-H. and Kim, G. 2003. Query type classification for web document retrieval. In SIGIR '03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM Press, New York, 64--71.]]
[21]
Kardkovács, Z. T., Tikk, D., and Bánsághi, Z. 2005. The ferrety algorithm for the kdd cup 2005 problem. SIGKDD Explor. Newsl. 7, 2, 111--116.]]
[22]
Kittler, J., Hatef, M., Duin, R. P. W., and Matas, J. 1998. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20, 3, 226--239.]]
[23]
Lewis, D. D. and Gale, W. A. 1994. A sequential algorithm for training text classifiers. In SIGIR, W. B. Croft and C. J. van Rijsbergen, Eds. Springer Verlag, Berlin, Germany, 3--12.]]
[24]
Li, Y., Zheng, Z., and Dai, H. K. 2005. Kdd cup-2005 report: Facing a great challenge. SIGKDD Explor. Newsl. 7, 2, 91--99.]]
[25]
McCallum, A. and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization.]]
[26]
Meyer, D. A. and Brown, T. A. 1998. Statistical mechanics of voting. Phys. Revei. Lett. 81, 8, 1718--1721.]]
[27]
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. 1990. Introduction to wordnet: An on-line lexical database. Int. J. Lexicography 3, 4, 23--244.]]
[28]
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford Digital Library Technologies Project.]]
[29]
Selberg, E. and Etzioni, O. 1995. Multi-service search and comparison using the MetaCrawler. In Proceedings of the 4th International World-Wide web Conference. Darmstadt, Germany.]]
[30]
Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2005. Q2c@ust: Our winning solution to query classification in kddcup 2005. SIGKDD Explor. Newsl. 7, 2, 100--110.]]
[31]
Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006. Building bridges for web query classification. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.]]
[32]
Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large web search engine query log. SIGIR Forum 33, 1, 6--12.]]
[33]
Tikk, D., Biró, Gy., and Yang, J. D. 2005. Experiments with a hierarchial text categorization method on WIPO patent collections. In Applied Research in Uncertainty Modelling and Analysis, N. O. Attok-Okine and B. M. Ayyub, Eds. International Series in Intelligent Technologies, vol. 20, Springer-Verlag, 283--302.]]
[34]
Van, R. C. 1979. Information Retrieval, 2nd ed. Butterworth, London, UK.]]
[35]
Vogel, D., Bickel, S., Haider, P., Schimpfky, R., Siemen, P., Bridges, S., and Scheffer, T. 2005. Classifying search engine queries using the web as background knowledge. SIGKDD Explor. Newsl. 7, 2, 117--122.]]
[36]
Voorhees, E. M. 1994. Query expansion using lexical-semantic relations. In SIGIR '94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer Verlag, Berlin, Germany. 61--69.]]
[37]
Wen, J.-R., Nie, J.-Y., and Zhang, H.-J. 2002. Query clustering using user logs. ACM Trans. Inf. Syst. 20, 1, 59--81.]]
[38]
Yang, Y. 1999. An evaluation of statistical approaches to text categorization. Inf. Retr. 1, 1--2, 69--90.]]
[39]
Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In ICML '97: Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 412--420.]]

Cited By

View all

Recommendations

Reviews

George Pallis

Web query classification is a rather interesting area of research, since Web users' queries are typically short, noisy, and ambiguous. This paper presents a new approach to classifying Web search queries, made up of two phases. First, the classifiers are learned by including synonym-based and statistics-based techniques. Then, the queries are classified into one or more categories, based on the classifiers that were previously learned. The proposed approach is quite detailed, and is supported by a wide range of examples, which help in understanding the underlying issues. Furthermore, the paper is quite strong in its experimental presentation, which shows that the proposed approach works well in practice, and has proven useful for classifying Web queries. As a computer science paper, however, this work is lacking in coverage of computational aspects. Although the authors provide a running time analysis, it would be useful to see a further discussion of the complexity of the tasks included in their approach. To sum up, this is an interesting, well-written, and easy-to-follow paper; readers will not need to be tech-savvy to understand its contents. This work is well worth reading by anyone interested in the area of information retrieval. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 24, Issue 3
July 2006
110 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1165774
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2006
Published in TOIS Volume 24, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. KDDCUP2005
  2. Query classification
  3. ensemble learning
  4. query enrichment
  5. synonym-based classifier

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)6
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media