skip to main content
10.1145/1390334.1390356acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Query dependent ranking using K-nearest neighbor

Published: 20 July 2008 Publication History

Abstract

Many ranking models have been proposed in information retrieval, and recently machine learning techniques have also been applied to ranking model construction. Most of the existing methods do not take into consideration the fact that significant differences exist between queries, and only resort to a single function in ranking of documents. In this paper, we argue that it is necessary to employ different ranking models for different queries and onduct what we call query-dependent ranking. As the first such attempt, we propose a K-Nearest Neighbor (KNN) method for query-dependent ranking. We first consider an online method which creates a ranking model for a given query by using the labeled neighbors of the query in the query feature space and then rank the documents with respect to the query using the created model. Next, we give two offline approximations of the method, which create the ranking models in advance to enhance the efficiency of ranking. And we prove a theory which indicates that the approximations are accurate in terms of difference in loss of prediction, if the learning algorithm used is stable with respect to minor changes in training examples. Our experimental results show that the proposed online and offline methods both outperform the baseline method of using a single ranking function.

References

[1]
S. Agarwal and P. Niyogi. Stability and generalization of bipartite ranking algorithms. In Proceedings of COLT 2005, pages 32--47, 2005.
[2]
R. Baeza--Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, May 1999.
[3]
S. M. Beitzel, E. C. Jensen, A. Chowdhury, and O. Frieder. Varying approaches to topical web query classification. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 783--784, New York, NY, USA, 2007. ACM.
[4]
S. M. Beitzel, E. C. Jensen, O. Frieder, D. Grossman, D. D. Lewis, A. Chowdhury, and A. Kolcz. Automatic web query classification using labeled and unlabeled training data. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 581--582, New York, NY, USA, 2005. ACM.
[5]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 89--96, New York, NY, USA, 2005. ACM.
[6]
Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In ICML '07, volume 227 of ACM International Conference Proceeding Series, pages 129--136. ACM, 2007.
[7]
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4:933--969, 2003.
[8]
D. S. Guru and H. S. Nagendraswamy. Clustering of interval-valued symbolic patterns based on mutual similarity value and the concept of -mutual nearest neighborhood. In ACCV (2), pages 234--243, 2006.
[9]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.
[10]
T. Joachims. Making large-scale support vector machine learning practical. In Advances in Kernel Methods: Support Vector Machines.
[11]
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2002.
[12]
I. Kang and G. Kim. Query type classification for web document retrieval. In SIGIR '03: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, 2003.
[13]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 111--119, New York, NY, USA, 2001. ACM.
[14]
U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 391--400, New York, NY, USA, 2005. ACM.
[15]
T. Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. LETOR: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR '07: Proceedings of the Learning to Rank workshop in the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007.
[16]
T.-Y. Liu, Y. Yang, H. Wan, H.-J. Zeng, Z. Chen, and W.-Y. Ma. Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl., 7(1):36--43, 2005.
[17]
R. Nallapati. Discriminative models for information retrieval. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 64--71, New York, NY, USA, 2004. ACM.
[18]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Research and Development in Information Retrieval, pages 275--281, 1998.
[19]
F. P. Preparata and M. I. Shamos. Computational Geometry:Discriminative models An Introduction (Monographs in Computer Science). Springer, August 1985.
[20]
M. Richardson, A. Prakash, and E. Brill. Beyond PageRank: machine learning for static ranking. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 707--715, New York, NY, USA, 2006. ACM.
[21]
S. Robertson. Overview of the okapi projects. In Journal of Documentation, pages 275--281, 1998.
[22]
D. E. Rose and D. Levinson. Understanding user goals in web search. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 13--19, New York, NY, USA, 2004. ACM.
[23]
G. Salton. The SMART Retrieval System-Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1971.
[24]
G. Salton and M. E. Lesk. Computer evaluation of indexing and text processing. J. ACM, 15(1):8--36, 1968.
[25]
D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 131--138, New York, NY, USA, 2006. ACM.
[26]
R. Song, J.-R. Wen, S. Shi, G. Xin, T.-Y. Liu, T. Qin, X. Zheng, J. Zhang, G. Xue, and W.-Y. Ma. Microsoft research asia at web track and terabyte track of trec 2004. In Proceedings of the Thirteenth Text REtrieval Conference Proceedings (TREC-2004), 2004.
[27]
E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In Advances in NIPS, number vol. 15, 2003.
[28]
J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 391--398, New York, NY, USA, 2007. ACM.
[29]
Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 271--278, New York, NY, USA, 2007. ACM.

Cited By

View all

Index Terms

  1. Query dependent ranking using K-nearest neighbor

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
    July 2008
    934 pages
    ISBN:9781605581644
    DOI:10.1145/1390334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. k-nearest neighbor
    2. query dependent ranking
    3. stability

    Qualifiers

    • Research-article

    Conference

    SIGIR '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)33
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media