skip to main content
10.1145/1401890.1401906acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Structured learning for non-smooth ranking losses

Published: 24 August 2008 Publication History

Abstract

Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP (mean average precision). We propose new, almost-linear-time algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain) in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization. The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.

References

[1]
R. Herbrich, T. Graepel, and K. Obermayer, "Support vector learning for ordinal regression,'' in International Conference on Artificial Neural Networks, 1999, pp. 97--102. http://www.research.microsoft.com/~rherb/papers/hergraeober99b.ps.gz
[2]
T. Joachims, "Optimizing search engines using clickthrough data,'' in SIGKDD Conference. ACM, 2002. http://www.cs.cornell.edu/People/tj/publications/joachims_02c.pdf
[3]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, "Learning to rank using gradient descent,'' in ICML, 2005. http://research.microsoft.com/~ cburges/papers/ICML_ranking.pdf
[4]
T. Joachims, "A support vector method for multivariate performance measures,'' in ICML, 2005, pp. 377--384. http://www.machinelearning.org/proceedings/icml2005/papers/048_ASupport_Joachims.pdf
[5]
------, "Training linear SVMs in linear time,'' in SIGKDD Conference, 2006, pp. 217--226. http://www.cs.cornell.edu/people/tj/publications/joachims_06a.pdf
[6]
C. J. C. Burges, R. Ragno, and Q. V. Le, "Learning to rank with nonsmooth cost functions,'' in NIPS, 2006. http://research.microsoft.com/~cburges/papers/LambdaRank.pdf
[7]
Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li, "Learning to rank: From pairwise approach to listwise approach,'' in ICML, 2007, pp. 129--136. http://www.machinelearning.org/proceedings/icml2007/papers/139.pdf
[8]
M. Taylor, J. Guiver, S. Robertson, and T. Minka, "SoftRank: Optimising non-smooth rank metrics,'' in WSDM. ACM, 2008. http://research.microsoft.com/~joguiver/sigir07LetorSoftRankCam.pdf
[9]
Y. Yue, T. Finley, F. Radlinski, and T. Joachims, "A support vector method for optimizing average precision,'' in SIGIR Conference, 2007. http://www.cs.cornell.edu/People/tj/publications/yue_etal_07a.pdf
[10]
P. Li, C. J. C. Burges, and Q. Wu, "McRank: Learning to rank using multiple classification and gradient boosting,'' in NIPS, 2007, pp. 845--852. http://books.nips.cc/papers/files/nips20/NIPS2007_0845.pdf
[11]
V. Vapnik, S. Golowich, and A. J. Smola, "Support vector method for function approximation, regression estimation, and signal processing,'' in Advances in Neural Information Processing Systems. MIT Press, 1996.
[12]
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, "Large margin methods for structured and interdependent output variables,'' JMLR, vol. 6, no. Sep, pp. 1453--1484, 2005. http://ttic.uchicago.edu/~altun/pubs/TsoJoaHofAlt-JMLR.pdf
[13]
E. Voorhees, "Overview of the TREC 2001 question answering track,'' in The Tenth Text REtrieval Conference, ser. NIST Special Publication, vol. 500-250, 2001, pp. 42--51. http://trec.nist.gov/pubs/trec10/t10_proceedings.html
[14]
K. Järvelin and J. Kekäläinen, "IR evaluation methods for retrieving highly relevant documents,'' in SIGIR Conference, 2000, pp. 41--48. http://www.info.uta.fi/tutkimus/fire/archive/KJJKSIGIR00.pdf
[15]
E. Snelson and J. Guiver, "SoftRank with gaussian processes,'' in NIPS 2007 Workshop on Machine Learning for Web Search, 2007. http://research.microsoft.com/CONFERENCES/NIPS07/papers/gprank.pdf
[16]
T.-Y. Liu, T. Qin, J. Xu, W. Xiong, and H. Li, "LETOR: Benchmark dataset for research on learning to rank for information retrieval,'' in LR4IR Workshop, 2007. http://research.microsoft.com/users/LETOR/
[17]
O. Chapelle, Q. Le, and A. Smola, "Large margin optimization of ranking measures,'' in NIPS 2007 Workshop on Machine Learning for Web Search, 2007. http://research.microsoft.com/CONFERENCES/NIPS07/papers/ranking.pdf
[18]
Q. V. Le and A. J. Smola, "Direct optimization of ranking measures,'' Feb. 2008, arXiv:0704.3359v1. http://arxiv.org/pdf/0704.3359
[19]
C. Burges, "Learning to rank for Web search: Some new directions,'' Keynote talk at SIGIR Ranking Workshop, July 2007. http://research.microsoft.com/users/LR4IR-2007/LearningToRankKeynote_Burges.pdf
[20]
R. Ahuja, T. Magnanti, and J. Orlin, Network Flows. Prentice Hall, 1993.
[21]
M.-F. Tsai, T.-Y. Liu, T. Qin, H.-H. Chen, and W.-Y. Ma, "FRank: A ranking method with fidelity loss,'' in SIGIR Conference, 2007, pp. 383--390. http://portal.acm.org/ft_gateway.cfm?id=1277808&type=pdf
[22]
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Wadsworth & Brooks/Cole, 1984, iSBN: 0-534-98054-6.
[23]
I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd ed. San Francisco: Morgan Kaufmann, 2005. http://www.cs.waikato.ac.nz/ml/weka/
[24]
A. Bordes, L. Bottou, P. Gallinari, and J. Weston, "Solving multiclass support vector machines with LaRank,'' in ICML. ACM, 2007, pp. 89--96. http://www.machinelearning.org/proceedings/icml2007/papers/381.pdf

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2008
1116 pages
ISBN:9781605581934
DOI:10.1145/1401890
  • General Chair:
  • Ying Li,
  • Program Chairs:
  • Bing Liu,
  • Sunita Sarawagi
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. max-margin structured learning to rank
  2. non-decomposable loss functions

Qualifiers

  • Research-article

Conference

KDD08

Acceptance Rates

KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media