research-article

Personalized Recommendation via Parameter-Free Contextual Bandits

Authors:

Tao LiAuthors Info & Claims

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 323 - 332

https://doi.org/10.1145/2766462.2767707

Published: 09 August 2015 Publication History

Abstract

Personalized recommendation services have gained increasing popularity and attention in recent years as most useful information can be accessed online in real-time. Most online recommender systems try to address the information needs of users by virtue of both user and content information. Despite extensive recent advances, the problem of personalized recommendation remains challenging for at least two reasons. First, the user and item repositories undergo frequent changes, which makes traditional recommendation algorithms ineffective. Second, the so-called cold-start problem is difficult to address, as the information for learning a recommendation model is limited for new items or new users. Both challenges are formed by the dilemma of exploration and exploitation. In this paper, we formulate personalized recommendation as a contextual bandit problem to solve the exploration/exploitation dilemma. Specifically in our work, we propose a parameter-free bandit strategy, which employs a principled resampling approach called online bootstrap, to derive the distribution of estimated models in an online manner. Under the paradigm of probability matching, the proposed algorithm randomly samples a model from the derived distribution for every recommendation. Extensive empirical experiments on two real-world collections of web data (including online advertising and news recommendation) demonstrate the effectiveness of the proposed algorithm in terms of the click-through rate. The experimental results also show that this proposed algorithm is robust in the cold-start situation, in which there is no sufficient data or knowledge to tune the hyper-parameters.

References

[1]

G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. TKDE, 17(6):734--749, 2005.

Digital Library

[2]

D. Agarwal, B.-C. Chen, and P. Elango. Explore/exploit schemes for web content optimization. In ICDM, pages 1--10. IEEE, 2009.

Digital Library

[3]

D. Agarwal, B.-C. Chen, P. Elango, N. Motgi, S.-T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In NIPS, pages 17--24, 2008.

[4]

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48--77, 2002.

Digital Library

[5]

P. Auer and N. C.-B. P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2--3):235--256, 2002.

Digital Library

[6]

B. Awerbuch and R. Kleinberg. Online linear optimization and adaptive routing. Journal of Computer and System Sciences, 74(1):97--114, 2008.

Digital Library

[7]

C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). 2006.

Digital Library

[8]

O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In NIPS, pages 2249--2257, 2011.

Digital Library

[9]

T. Chen, Z. Zheng, Q. Lu, W. Zhang, and Y. Yu. Feature-based matrix factorization. arXiv preprint arXiv:1109.2271, 2011.

[10]

B. Efron and R. J. Tibshirani. An introduction to the bootstrap, volume 57. 1994.

[11]

Y. Gai, B. Krishnamachari, and R. Jain. Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. TON, 20(5):1466--1478, 2012.

Digital Library

[12]

S. Gelly, Y. Wang, R. Munos, and O. Teytaud. Modification of uct with patterns in monte-carlo go. Technical report, 2006.

[13]

T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in Microsoft's bing search engine. In ICML, pages 13--20, 2010.

Digital Library

[14]

O.-C. Granmo. Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 3(2):207--234, 2010.

[15]

R. V. Hogg and E. A. Tanis. Probability and Statistical Inference. Prentice Hall, 1996.

[16]

D. Hsu, N. Karampatziakis, J. Langford, and A. J. Smola. Parallel online learning. CoRR, abs/1103.4204, 2011.

[17]

D. E. Knuth. The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms. 1997.

Digital Library

[18]

J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In NIPS, 2007.

Digital Library

[19]

L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, pages 661--670. ACM, 2010.

Digital Library

[20]

L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, pages 297--306, 2011.

Digital Library

[21]

B. C. May, N. Korda, A. Lee, and D. S. Leslie. Optimistic bayesian sampling in contextual-bandit problems. Journal of Machine Learning Research, 13:2069--2106, 2012.

Digital Library

[22]

B. C. May, N. Korda, A. Lee, and D. S. Leslie. Optimistic bayesian sampling in contextual-bandit problems. The Journal of Machine Learning Research, 13(1):2069--2106, 2012.

Digital Library

[23]

N. C. Oza and S. Russell. Online bagging and boosting. In IEEE international conference on Systems, man and cybernetics, volume 3, pages 2340--2345, 2005.

[24]

S. Pandey, D. Chakrabarti, and D. Agarwal. Multi-armed bandit problems with dependent arms. In ICML, pages 721--728, 2007.

Digital Library

[25]

S. Pandey and C. Olston. Handling advertisements of unknown quality in search advertising. In NIPS, pages 1065--1072, 2006.

[26]

Z. Qin, V. Petricek, N. Karampatziakis, L. Li, and J. Langford. Efficient online bootstrapping for large scale learning. arXiv preprint arXiv:1312.5021, 2013.

[27]

F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML, pages 784--791. ACM, 2008.

Digital Library

[28]

A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. In SIGIR, pages 253--260. ACM, 2002.

Digital Library

[29]

S. L. Scott. A modern bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6):639--658, 2010.

Digital Library

[30]

A. Slivkins. Multi-armed bandits on implicit metric spaces. In NIPS, pages 1602--1610, 2011.

[31]

L. Tang, R. Rosales, A. Singh, and D. Agarwal. Automatic ad format selection via contextual bandits. In CIKM, pages 1587--1594, 2013.

Digital Library

[32]

W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285--294, 1933.

[33]

L. Tierney and J. B. Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81(393):82--86, 1986.

[34]

M. Tokic. Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence, pages 203--210. 2010.

[35]

J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. In ECML, pages 437--448. 2005.

Digital Library

[36]

X. Zhao, W. Zhang, and J. Wang. Interactive collaborative filtering. In ACM CIKM, pages 1411--1420, 2013.

Digital Library

Cited By

Wan RWei HKveton BSong RKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Multiplier bootstrap-based explorationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619884(35444-35490)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619884
Kasirzadeh AEvans C(2023)User Tampering in Reinforcement Learning Recommender SystemsProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604669(58-69)Online publication date: 8-Aug-2023
https://dl.acm.org/doi/10.1145/3600211.3604669
Xie HTang QZhu Q(2023)A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual BanditsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.316180634:12(9887-9899)Online publication date: Dec-2023
https://doi.org/10.1109/TNNLS.2022.3161806
Show More Cited By

Index Terms

Personalized Recommendation via Parameter-Free Contextual Bandits
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing systems and tools
2. Information systems
  1. Information systems applications
    1. Data mining
  2. World Wide Web

Recommendations

A contextual-bandit approach to personalized news article recommendation
WWW '10: Proceedings of the 19th international conference on World wide web

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two ...
Ensemble contextual bandits for personalized recommendation
RecSys '14: Proceedings of the 8th ACM Conference on Recommender systems

The cold-start problem has attracted extensive attention among various online services that provide personalized recommendation. Many online vendors employ contextual bandit strategies to tackle the so-called exploration/exploitation dilemma rooted from ...
Effects of Personalized and Aggregate Top-N Recommendation Lists on User Preference Ratings

Prior research has shown a robust effect of personalized product recommendations on user preference judgments for items. Specifically, the display of system-predicted preference ratings as item recommendations has been shown in multiple studies to bias ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2015

1198 pages

ISBN:9781450336215

DOI:10.1145/2766462

General Chair:
Ricardo Baeza-Yates
Yahoo Labs, USA
,
Program Chairs:
Mounia Lalmas
Yahoo Labs, UK
,
Alistair Moffat
University of Melbourne, Australia
,
Berthier Ribeiro-Neto
Google, Brazil, and UFMG, Brazil

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGIR '15

Sponsor:

SIGIR

SIGIR '15: The 38th International ACM SIGIR conference on research and development in Information Retrieval

August 9 - 13, 2015

Santiago, Chile

Acceptance Rates

SIGIR '15 Paper Acceptance Rate 70 of 351 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
1,178
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)9

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wan RWei HKveton BSong RKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Multiplier bootstrap-based explorationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619884(35444-35490)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619884
Kasirzadeh AEvans C(2023)User Tampering in Reinforcement Learning Recommender SystemsProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604669(58-69)Online publication date: 8-Aug-2023
https://dl.acm.org/doi/10.1145/3600211.3604669
Xie HTang QZhu Q(2023)A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual BanditsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.316180634:12(9887-9899)Online publication date: Dec-2023
https://doi.org/10.1109/TNNLS.2022.3161806
Starnes ADereventsov AWebster C(2023)Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks2023 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW60847.2023.00197(1551-1558)Online publication date: 4-Dec-2023
https://doi.org/10.1109/ICDMW60847.2023.00197
Badnava BRoach KCheung KHashemi MShroff N(2023)Energy-Efficient Deadline-Aware Edge Computing: Bandit Learning with Partial Observations in Multi-Channel SystemsGLOBECOM 2023 - 2023 IEEE Global Communications Conference10.1109/GLOBECOM54140.2023.10437602(3081-3086)Online publication date: 4-Dec-2023
https://doi.org/10.1109/GLOBECOM54140.2023.10437602
Kang EKumar P(2023)Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls Oracle2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton)10.1109/Allerton58177.2023.10313452(1-7)Online publication date: 26-Sep-2023
https://doi.org/10.1109/Allerton58177.2023.10313452
Liu HCai KLi PQian CZhao PWu X(2023)REDRL: A review-enhanced Deep Reinforcement Learning model for interactive recommendationExpert Systems with Applications10.1016/j.eswa.2022.118926213(118926)Online publication date: Mar-2023
https://doi.org/10.1016/j.eswa.2022.118926
Jain KJindal R(2023)Sampling and noise filtering methods for recommender systems: A literature reviewEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106129122(106129)Online publication date: Jun-2023
https://doi.org/10.1016/j.engappai.2023.106129
Lin RTang FHe CWu ZYuan CTang Y(2023)DIRS-KG: a KG-enhanced interactive recommender system based on deep reinforcement learningWorld Wide Web10.1007/s11280-022-01135-x26:5(2471-2493)Online publication date: 1-Apr-2023
https://doi.org/10.1007/s11280-022-01135-x
Dereventsov ABibin A(2022)Simulated Contextual Bandits for Personalization Tasks from Recommendation Datasets2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00127(1-6)Online publication date: Nov-2022
https://doi.org/10.1109/ICDMW58026.2022.00127
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents