skip to main content
research-article

On the choice of effectiveness measures for learning to rank

Published: 01 June 2010 Publication History

Abstract

Most current machine learning methods for building search engines are based on the assumption that there is a target evaluation metric that evaluates the quality of the search engine with respect to an end user and the engine should be trained to optimize for that metric. Treating the target evaluation metric as a given, many different approaches (e.g. LambdaRank, SoftRank, RankingSVM, etc.) have been proposed to develop methods for optimizing for retrieval metrics. Target metrics used in optimization act as bottlenecks that summarize the training data and it is known that some evaluation metrics are more informative than others. In this paper, we consider the effect of the target evaluation metric on learning to rank. In particular, we question the current assumption that retrieval systems should be designed to directly optimize for a metric that is assumed to evaluate user satisfaction. We show that even if user satisfaction can be measured by a metric X, optimizing the engine on a training set for a more informative metric Y may result in a better test performance according to X (as compared to optimizing the engine directly for X on the training set). We analyze the situations as to when there is a significant difference in the two cases in terms of the amount of available training data and the number of dimensions of the feature space.

References

[1]
Aslam, J. A., Yilmaz, E., & Pavlu V. (2005). The maximum entropy method for analyzing retrieval measures. In Marchionini, G., Moffat A., Tait, J., Baeza-Yates, R., & Ziviani, N., (Eds.), Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 27–34). ACM Press, August 2005.
[2]
Burges C. J. C., Ragno R., and Le Q. V. Schölkopf B., Platt J. C., and Hoffman T. Learning to rank with nonsmooth cost functions NIPS 2006 Cambridge, MA MIT Press 193-200
[3]
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In ICML ’05: Proceedings of the 22nd international conference on machine learning (pp. 89–96). New York, NY, USA: ACM.
[4]
Donmez, P., Svore, K., & Burges, C. J. (2008). On the optimality of lambdarank. Technical report, Microsoft Research.
[5]
Järvelin, K., & Kekäläinen, J. (2000). Ir evaluation methods for retrieving highly relevant documents. In SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 41–48). New York, NY, USA: ACM.
[6]
Joachims, T. (2005). A support vector method for multivariate performance measures. In ICML ’05: Proceedings of the 22nd international conference on Machine learning (pp 377–384). New York, NY, USA: ACM.
[7]
Le, Q. V., & Smola. A. J. (2007). Direct optimization of ranking measures. CoRR, abs/0704.3359.
[8]
Ling, C. X., Huang, J., & Zhang, H. (2003). Auc: A statistically consistent and more discriminating measure than accuracy. In IJCAI ’03: Proceedings of the 18th international conference on artificial intelligence (pp. 329–341).
[9]
Liu, T.-Y., & He Y. (2008). Are algorithms directly optimizing ir measures really direct? Technical report, Microsoft Research.
[10]
Robertson, S. (2008). A new interpretation of average precision. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 689–690). New York, NY, USA: ACM.
[11]
Robertson S. and Zaragoza H. On rank-based effectiveness measures and optimization Information Retrieval 2007 10 3 321-339
[12]
Sanderson, M., & Zobel J. (2005). Information retrieval system evaluation: Effort, sensitivity, and reliability. In Baeza-Yates, R. A., Ziviani, N., Marchionini, G., Moffat, A., & Tait, J. (Eds.), SIGIR (pp. 162–169). ACM.
[13]
Taylor, M., Guiver, J., Robertson, S., & Minka, T. (2008). Softrank: optimizing non-smooth rank metrics. In WSDM ’08: Proceedings of the international conference on Web search and web data mining (pp. 77–86) New York, NY, USA: ACM.
[14]
Taylor, M., Zaragoza, H., Craswell, N., Robertson, S., & Burges, C. (2006). Optimisation methods for ranking functions with multiple parameters. In CIKM ’06: Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 585–593). New York, NY, USA: ACM
[15]
Webber, W., Moffat, A., Zobel, J., & Sakai, T. (2008). Precision-at-ten considered redundant. In SIGIR (pp. 695–696). New York, NY, USA: ACM.
[16]
Xu, J., & Li, H. (2007). Adarank: a boosting algorithm for information retrieval. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 391–398). New York, NY, USA: ACM.
[17]
Yilmaz E. and Aslam J. A. Estimating average precision when judgments are incomplete Knowledge and Information Systems 2008 16 2 173-211
[18]
Yue, Y., Finleym, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 271–278). New York, NY, USA: ACM.

Cited By

View all

Index Terms

  1. On the choice of effectiveness measures for learning to rank
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Information Retrieval
        Information Retrieval  Volume 13, Issue 3
        Jun 2010
        118 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 June 2010
        Accepted: 28 August 2009
        Received: 24 April 2009

        Author Tags

        1. Evaluation
        2. Evaluation metrics
        3. Learning to rank
        4. Training
        5. Empirical risk minimization

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 26 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media