skip to main content
10.5555/3692070.3694128guideproceedingsArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Adaptively learning to select-rank in online platforms

Published: 03 January 2025 Publication History

Abstract

Ranking algorithms are fundamental to various online platforms across e-commerce sites to content streaming services. Our research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users, a key component in personalizing user experience. We develop a user response model that considers diverse user preferences and the varying effects of item positions, aiming to optimize overall user satisfaction with the ranked list. We frame this problem within a contextual bandits framework, with each ranked list as an action. Our approach incorporates an upper confidence bound to adjust predicted user satisfaction scores and selects the ranking action that maximizes these adjusted scores, efficiently solved via maximum weight imperfect matching. We demonstrate that our algorithm achieves a cumulative regret bound of O(dNKT) for ranking K out of N items in a d-dimensional context space over T rounds, under the assumption that user responses follow a generalized linear model. This regret alleviates dependence on the ambient action space, whose cardinality grows exponentially with N and K (thus rendering direct application of existing adaptive learning algorithms – such as UCB or Thompson sampling – infeasible). Experiments conducted on both simulated and real-world datasets demonstrate our algorithm outperforms the baseline.

References

[1]
Google's recommendation systems developer course. https://developers.google.com/machine-learning/recommendation. Accessed: 2023-05-14.
[2]
Abbasi-Yadkori, Y., Pal, D., and Szepesvari, C. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pp. 2312-2320, 2011.
[3]
Agichtein, E., Brill, E., and Dumais, S. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19-26, 2006.
[4]
Agrawal, S. and Goyal, N. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pp. 39-1. JMLR Workshop and Conference Proceedings, 2012.
[5]
Agrawal, S., Avadhanula, V., Goyal, V., and Zeevi, A. Mnl-bandit: A dynamic learning approach to assortment selection. Operations Research, 67(5):1453-1485, 2019.
[6]
Auer, P., Cesa-Bianchi, N., and Fischer, P. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235-256, 2002.
[7]
Bello, I., Kulkarni, S., Jain, S., Boutilier, C., Chi, E., Eban, E., Luo, X., Mackey, A., and Meshi, O. Seq2slate: Reranking and slate optimization with rnns. arXiv preprint arXiv:1810.02019, 2018.
[8]
Bennett, J., Lanning, S., et al. The netflix prize. In Proceedings of KDD cup and workshop, volume 2007, pp. 35. New York, 2007.
[9]
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pp. 89-96, 2005.
[10]
Burges, C. J. From ranknet to lambdarank to lambdamart: An overview. Learning, 11(23-581):81, 2010.
[11]
Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., and Li, H. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pp. 129-136, 2007.
[12]
Chen, K., Hu, I., and Ying, Z. Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs. The Annals of Statistics, 27(4):1155-1163, 1999.
[13]
Chen, M., Beutel, A., Covington, P., Jain, S., Belletti, F., and Chi, E. H. Top-k off-policy correction for a reinforce recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 456-464, 2019.
[14]
Chen, W., Wang, Y., and Yuan, Y. Combinatorial multiarmed bandit: General framework and applications. In International conference on machine learning, pp. 151-159. PMLR, 2013.
[15]
Chu, W., Li, L., Reyzin, L., and Schapire, R. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 208-214. JMLR Workshop and Conference Proceedings, 2011.
[16]
Chuklin, A., Markov, I., and De Rijke, M. Click models for web search. Springer Nature, 2022.
[17]
Collins, A., Tkaczyk, D., Aizawa, A., and Beel, J. A study of position bias in digital library recommender systems. arXiv preprint arXiv:1802.06565, 2018.
[18]
Combes, R., Talebi Mazraeh Shahi, M. S., Proutiere, A., et al. Combinatorial bandits revisited. Advances in neural information processing systems, 28, 2015.
[19]
Craswell, N., Zoeter, O., Taylor, M., and Ramsey, B. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining, pp. 87-94, 2008.
[20]
Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y. An efficient boosting algorithm for combining preferences. Journal of machine learning research, 4(Nov):933-969, 2003.
[21]
Gauthier, C.-S., Gaudel, R., and Fromont, E. Unirank: Unimodal bandit algorithms for online ranking. In International Conference on Machine Learning, pp. 7279-7309. PMLR, 2022.
[22]
Guo, H., Yu, J., Liu, Q., Tang, R., and Zhang, Y. Pal: a position-bias aware learning framework for ctr prediction in live recommender systems. In Proceedings of the 13th ACM Conference on Recommender Systems, pp. 452-456, 2019.
[23]
Hamidi, N. and Bayati, M. A general theory of the stochastic linear bandit and its applications. arXiv preprint arXiv:2002.05152, 2020.
[24]
Herbrich, R., Graepel, T., and Obermayer, K. Support vector learning for ordinal regression. 1999.
[25]
Hu, Y., Koren, Y., and Volinsky, C. Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE international conference on data mining, pp. 263-272. Ieee, 2008.
[26]
Joachims, T. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 133-142, 2002.
[27]
Katariya, S., Kveton, B., Szepesvari, C., and Wen, Z. Dcm bandits: Learning to rank with multiple clicks. In International Conference on Machine Learning, pp. 1215-1224. PMLR, 2016.
[28]
Komiyama, J., Honda, J., and Takeda, A. Position-based multiple-play bandit problem with unknown position bias. Advances in Neural Information Processing Systems, 30, 2017.
[29]
Kuhn, H. W. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83-97, 1955.
[30]
Kveton, B., Szepesvari, C., Wen, Z., and Ashkan, A. Cascading bandits: Learning to rank in the cascade model. In International conference on machine learning, pp. 767-776. PMLR, 2015.
[31]
Lagree, P., Vernade, C., and Cappe, O. Multiple-play bandits in the position-based model. Advances in Neural Information Processing Systems, 29, 2016.
[32]
Lai, T. L., Robbins, H., et al. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6 (1):4-22, 1985.
[33]
Lattimore, T., Kveton, B., Li, S., and Szepesvari, C. Toprank: A practical algorithm for online stochastic ranking. Advances in Neural Information Processing Systems, 31, 2018.
[34]
Lee, C.-P. and Lin, C.-J. Large-scale linear ranksvm. Neural computation, 26(4):781-817, 2014.
[35]
Lerman, K. and Hogg, T. Leveraging position bias to improve peer recommendation. PloS one, 9(6):e98914, 2014.
[36]
Li, L. and Lin, H.-T. Ordinal regression by extended binary classification. Advances in neural information processing systems, 19, 2006.
[37]
Li, L., Lu, Y., and Zhou, D. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pp. 2071-2080. PMLR, 2017.
[38]
Li, P., Wu, Q., and Burges, C. Mcrank: Learning to rank using multiple classification and gradient boosting. Advances in neural information processing systems, 20, 2007.
[39]
Li, S., Wang, B., Zhang, S., and Chen, W. Contextual combinatorial cascading bandits. In International conference on machine learning, pp. 1245-1253. PMLR, 2016.
[40]
Li, S., Lattimore, T., and Szepesvari, C. Online learning to rank with features. In International Conference on Machine Learning, pp. 3856-3865. PMLR, 2019.
[41]
Lin, C., Liu, X., Xv, G., and Li, H. Mitigating sentiment bias for recommender systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 31-40, 2021.
[42]
Liu, T.-Y. et al. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3): 225-331, 2009.
[43]
McCullagh, P. Generalized linear models. Routledge, 2019.
[44]
Qin, L., Chen, S., and Zhu, X. Contextual combinatorial bandit and its application on diversified online recommendation. In Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 461-469. SIAM, 2014.
[45]
Ramshaw, L. and Tarjan, R. E. A weight-scaling algorithm for min-cost imperfect matchings in bipartite graphs. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pp. 581-590. IEEE, 2012.
[46]
Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z., et al. A tutorial on thompson sampling. Foundations and Trends® in Machine Learning, 11(1):1-96, 2018.
[47]
Shidani, A., Deligiannidis, G., and Doucet, A. Ranking in generalized linear bandits. In Workshop on Recommendation Ecosystems: Modeling, Optimization and Incentive Design, 2024.
[48]
Thompson, W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285-294, 1933.
[49]
Tropp, J. A. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8(1-2):1-230, May 2015. ISSN 1935-8237.
[50]
Wang, F., Fang, X., Liu, L., Chen, Y., Tao, J., Peng, Z., Jin, C., and Tian, H. Sequential evaluation and generation framework for combinatorial recommender system. arXiv preprint arXiv:1902.00245, 2019.
[51]
Yao, S., Tan, J., Chen, X., Yang, K., Xiao, R., Deng, H., and Wan, X. Learning a product relevance model from click-through data in e-commerce. In Proceedings of the Web Conference 2021, pp. 2890-2899, 2021.
[52]
Ye, Z., Zhang, D. J., Zhang, H., Zhang, R., Chen, X., and Xu, Z. Cold start to improve market thickness on online advertising platforms: Data-driven algorithms and field experiments. Management Science, 2022.
[53]
Yu, Y., Jin, B., Song, J., Li, B., Zheng, Y., and Zhuo, W. Improving micro-video recommendation by controlling position bias. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19-23, 2022, Proceedings, Part I, pp. 508-523. Springer, 2023.
[54]
Zhao, Z., Hong, L., Wei, L., Chen, J., Nath, A., Andrews, S., Kumthekar, A., Sathiamoorthy, M., Yi, X., and Chi, E. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems, pp. 43-51, 2019.
[55]
Zhong, Z., Chueng, W. C., and Tan, V. Y. Thompson sampling algorithms for cascading bandits. The Journal of Machine Learning Research, 22(1):9915-9980, 2021.
[56]
Zhou, D., Li, L., and Gu, Q. Neural contextual bandits with ucb-based exploration. In International Conference on Machine Learning, pp. 11492-11502. PMLR, 2020.
[57]
Zoghi, M., Tunys, T., Li, L., Jose, D., Chen, J., Chin, C. M., and de Rijke, M. Click-based hot fixes for underperforming torso queries. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 195-204, 2016.
[58]
Zong, S., Ni, H., Sung, K., Ke, N. R., Wen, Z., and Kveton, B. Cascading bandits for large-scale recommendation problems. arXiv preprint arXiv:1603.05359, 2016.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'24: Proceedings of the 41st International Conference on Machine Learning
July 2024
63010 pages

Publisher

JMLR.org

Publication History

Published: 03 January 2025

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media