research-article

Adaptively learning to select-rank in online platforms

AUTHORs: Jingyuan Wang, Perry Dong, Ying Jin,

Ruohan Zhan, Zhengyuan ZhouAuthors Info & Claims

ICML'24: Proceedings of the 41st International Conference on Machine Learning

Article No.: 2058, Pages 50288 - 50312

Published: 03 January 2025 Publication History

Abstract

Ranking algorithms are fundamental to various online platforms across e-commerce sites to content streaming services. Our research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users, a key component in personalizing user experience. We develop a user response model that considers diverse user preferences and the varying effects of item positions, aiming to optimize overall user satisfaction with the ranked list. We frame this problem within a contextual bandits framework, with each ranked list as an action. Our approach incorporates an upper confidence bound to adjust predicted user satisfaction scores and selects the ranking action that maximizes these adjusted scores, efficiently solved via maximum weight imperfect matching. We demonstrate that our algorithm achieves a cumulative regret bound of O(d√NKT) for ranking K out of N items in a d-dimensional context space over T rounds, under the assumption that user responses follow a generalized linear model. This regret alleviates dependence on the ambient action space, whose cardinality grows exponentially with N and K (thus rendering direct application of existing adaptive learning algorithms – such as UCB or Thompson sampling – infeasible). Experiments conducted on both simulated and real-world datasets demonstrate our algorithm outperforms the baseline.

References

[1]

Google's recommendation systems developer course. https://developers.google.com/machine-learning/recommendation. Accessed: 2023-05-14.

[2]

Abbasi-Yadkori, Y., Pal, D., and Szepesvari, C. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pp. 2312-2320, 2011.

Digital Library

[3]

Agichtein, E., Brill, E., and Dumais, S. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19-26, 2006.

Digital Library

[4]

Agrawal, S. and Goyal, N. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pp. 39-1. JMLR Workshop and Conference Proceedings, 2012.

[5]

Agrawal, S., Avadhanula, V., Goyal, V., and Zeevi, A. Mnl-bandit: A dynamic learning approach to assortment selection. Operations Research, 67(5):1453-1485, 2019.

Digital Library

[6]

Auer, P., Cesa-Bianchi, N., and Fischer, P. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235-256, 2002.

Digital Library

[7]

Bello, I., Kulkarni, S., Jain, S., Boutilier, C., Chi, E., Eban, E., Luo, X., Mackey, A., and Meshi, O. Seq2slate: Reranking and slate optimization with rnns. arXiv preprint arXiv:1810.02019, 2018.

[8]

Bennett, J., Lanning, S., et al. The netflix prize. In Proceedings of KDD cup and workshop, volume 2007, pp. 35. New York, 2007.

[9]

Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pp. 89-96, 2005.

Digital Library

[10]

Burges, C. J. From ranknet to lambdarank to lambdamart: An overview. Learning, 11(23-581):81, 2010.

[11]

Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., and Li, H. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pp. 129-136, 2007.

Digital Library

[12]

Chen, K., Hu, I., and Ying, Z. Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs. The Annals of Statistics, 27(4):1155-1163, 1999.

[13]

Chen, M., Beutel, A., Covington, P., Jain, S., Belletti, F., and Chi, E. H. Top-k off-policy correction for a reinforce recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 456-464, 2019.

Digital Library

[14]

Chen, W., Wang, Y., and Yuan, Y. Combinatorial multiarmed bandit: General framework and applications. In International conference on machine learning, pp. 151-159. PMLR, 2013.

[15]

Chu, W., Li, L., Reyzin, L., and Schapire, R. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 208-214. JMLR Workshop and Conference Proceedings, 2011.

[16]

Chuklin, A., Markov, I., and De Rijke, M. Click models for web search. Springer Nature, 2022.

[17]

Collins, A., Tkaczyk, D., Aizawa, A., and Beel, J. A study of position bias in digital library recommender systems. arXiv preprint arXiv:1802.06565, 2018.

[18]

Combes, R., Talebi Mazraeh Shahi, M. S., Proutiere, A., et al. Combinatorial bandits revisited. Advances in neural information processing systems, 28, 2015.

[19]

Craswell, N., Zoeter, O., Taylor, M., and Ramsey, B. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining, pp. 87-94, 2008.

Digital Library

[20]

Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y. An efficient boosting algorithm for combining preferences. Journal of machine learning research, 4(Nov):933-969, 2003.

[21]

Gauthier, C.-S., Gaudel, R., and Fromont, E. Unirank: Unimodal bandit algorithms for online ranking. In International Conference on Machine Learning, pp. 7279-7309. PMLR, 2022.

[22]

Guo, H., Yu, J., Liu, Q., Tang, R., and Zhang, Y. Pal: a position-bias aware learning framework for ctr prediction in live recommender systems. In Proceedings of the 13th ACM Conference on Recommender Systems, pp. 452-456, 2019.

Digital Library

[23]

Hamidi, N. and Bayati, M. A general theory of the stochastic linear bandit and its applications. arXiv preprint arXiv:2002.05152, 2020.

[24]

Herbrich, R., Graepel, T., and Obermayer, K. Support vector learning for ordinal regression. 1999.

[25]

Hu, Y., Koren, Y., and Volinsky, C. Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE international conference on data mining, pp. 263-272. Ieee, 2008.

Digital Library

[26]

Joachims, T. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 133-142, 2002.

Digital Library

[27]

Katariya, S., Kveton, B., Szepesvari, C., and Wen, Z. Dcm bandits: Learning to rank with multiple clicks. In International Conference on Machine Learning, pp. 1215-1224. PMLR, 2016.

[28]

Komiyama, J., Honda, J., and Takeda, A. Position-based multiple-play bandit problem with unknown position bias. Advances in Neural Information Processing Systems, 30, 2017.

[29]

Kuhn, H. W. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83-97, 1955.

[30]

Kveton, B., Szepesvari, C., Wen, Z., and Ashkan, A. Cascading bandits: Learning to rank in the cascade model. In International conference on machine learning, pp. 767-776. PMLR, 2015.

[31]

Lagree, P., Vernade, C., and Cappe, O. Multiple-play bandits in the position-based model. Advances in Neural Information Processing Systems, 29, 2016.

[32]

Lai, T. L., Robbins, H., et al. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6 (1):4-22, 1985.

[33]

Lattimore, T., Kveton, B., Li, S., and Szepesvari, C. Toprank: A practical algorithm for online stochastic ranking. Advances in Neural Information Processing Systems, 31, 2018.

[34]

Lee, C.-P. and Lin, C.-J. Large-scale linear ranksvm. Neural computation, 26(4):781-817, 2014.

Digital Library

[35]

Lerman, K. and Hogg, T. Leveraging position bias to improve peer recommendation. PloS one, 9(6):e98914, 2014.

[36]

Li, L. and Lin, H.-T. Ordinal regression by extended binary classification. Advances in neural information processing systems, 19, 2006.

[37]

Li, L., Lu, Y., and Zhou, D. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pp. 2071-2080. PMLR, 2017.

Digital Library

[38]

Li, P., Wu, Q., and Burges, C. Mcrank: Learning to rank using multiple classification and gradient boosting. Advances in neural information processing systems, 20, 2007.

[39]

Li, S., Wang, B., Zhang, S., and Chen, W. Contextual combinatorial cascading bandits. In International conference on machine learning, pp. 1245-1253. PMLR, 2016.

Digital Library

[40]

Li, S., Lattimore, T., and Szepesvari, C. Online learning to rank with features. In International Conference on Machine Learning, pp. 3856-3865. PMLR, 2019.

[41]

Lin, C., Liu, X., Xv, G., and Li, H. Mitigating sentiment bias for recommender systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 31-40, 2021.

Digital Library

[42]

Liu, T.-Y. et al. Learning to rank for information retrieval. Foundations and Trends^® in Information Retrieval, 3(3): 225-331, 2009.

Digital Library

[43]

McCullagh, P. Generalized linear models. Routledge, 2019.

[44]

Qin, L., Chen, S., and Zhu, X. Contextual combinatorial bandit and its application on diversified online recommendation. In Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 461-469. SIAM, 2014.

[45]

Ramshaw, L. and Tarjan, R. E. A weight-scaling algorithm for min-cost imperfect matchings in bipartite graphs. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pp. 581-590. IEEE, 2012.

Digital Library

[46]

Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z., et al. A tutorial on thompson sampling. Foundations and Trends^® in Machine Learning, 11(1):1-96, 2018.

Digital Library

[47]

Shidani, A., Deligiannidis, G., and Doucet, A. Ranking in generalized linear bandits. In Workshop on Recommendation Ecosystems: Modeling, Optimization and Incentive Design, 2024.

[48]

Thompson, W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285-294, 1933.

[49]

Tropp, J. A. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8(1-2):1-230, May 2015. ISSN 1935-8237.

Digital Library

[50]

Wang, F., Fang, X., Liu, L., Chen, Y., Tao, J., Peng, Z., Jin, C., and Tian, H. Sequential evaluation and generation framework for combinatorial recommender system. arXiv preprint arXiv:1902.00245, 2019.

[51]

Yao, S., Tan, J., Chen, X., Yang, K., Xiao, R., Deng, H., and Wan, X. Learning a product relevance model from click-through data in e-commerce. In Proceedings of the Web Conference 2021, pp. 2890-2899, 2021.

Digital Library

[52]

Ye, Z., Zhang, D. J., Zhang, H., Zhang, R., Chen, X., and Xu, Z. Cold start to improve market thickness on online advertising platforms: Data-driven algorithms and field experiments. Management Science, 2022.

[53]

Yu, Y., Jin, B., Song, J., Li, B., Zheng, Y., and Zhuo, W. Improving micro-video recommendation by controlling position bias. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19-23, 2022, Proceedings, Part I, pp. 508-523. Springer, 2023.

Digital Library

[54]

Zhao, Z., Hong, L., Wei, L., Chen, J., Nath, A., Andrews, S., Kumthekar, A., Sathiamoorthy, M., Yi, X., and Chi, E. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems, pp. 43-51, 2019.

Digital Library

[55]

Zhong, Z., Chueng, W. C., and Tan, V. Y. Thompson sampling algorithms for cascading bandits. The Journal of Machine Learning Research, 22(1):9915-9980, 2021.

Digital Library

[56]

Zhou, D., Li, L., and Gu, Q. Neural contextual bandits with ucb-based exploration. In International Conference on Machine Learning, pp. 11492-11502. PMLR, 2020.

Digital Library

[57]

Zoghi, M., Tunys, T., Li, L., Jose, D., Chen, J., Chin, C. M., and de Rijke, M. Click-based hot fixes for underperforming torso queries. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 195-204, 2016.

Digital Library

[58]

Zong, S., Ni, H., Sung, K., Ke, N. R., Wen, Z., and Kveton, B. Cascading bandits for large-scale recommendation problems. arXiv preprint arXiv:1603.05359, 2016.

Index Terms

Adaptively learning to select-rank in online platforms
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Learning to rank
        Ranking
    2. Learning settings
      1. Online learning settings
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank
  2. World Wide Web
    1. Online advertising
    2. Web applications
      1. Electronic commerce
        E-commerce infrastructure
        Online shopping

Index terms have been assigned to the content through auto-classification.

Recommendations

Online learning to rank with top-k feedback

We consider two settings of online learning to rank where feedback is restricted to top ranked items. The problem is cast as an online game between a learner and sequence of users, over T rounds. In both settings, the learners objective is to present ...
Learning to Rank: Regret Lower Bounds and Efficient Algorithms
SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

Algorithms for learning to rank Web documents, display ads, or other types of items constitute a fundamental component of search engines and more generally of online services. In such systems, when a user makes a request or visits a web page, an ordered ...
Learning to re-rank: query-dependent image re-ranking using click data
WWW '11: Proceedings of the 20th international conference on World wide web

Our objective is to improve the performance of keyword based image search engines by re-ranking their original results. To this end, we address three limitations of existing search engines in this paper. First, there is no straight-forward, fully ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'24: Proceedings of the 41st International Conference on Machine Learning

July 2024

63010 pages

Copyright © 2024.

Publisher

JMLR.org

Publication History

Published: 03 January 2025

Qualifiers

Research-article
Research
Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents