Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

Dylan Foster, Alexander Rakhlin
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3199-3210, 2020.

Abstract

A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-foster20a, title = {Beyond {UCB}: Optimal and Efficient Contextual Bandits with Regression Oracles}, author = {Foster, Dylan and Rakhlin, Alexander}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3199--3210}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/foster20a/foster20a.pdf}, url = {https://proceedings.mlr.press/v119/foster20a.html}, abstract = {A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.} }
Endnote
%0 Conference Paper %T Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles %A Dylan Foster %A Alexander Rakhlin %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-foster20a %I PMLR %P 3199--3210 %U https://proceedings.mlr.press/v119/foster20a.html %V 119 %X A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
APA
Foster, D. & Rakhlin, A.. (2020). Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3199-3210 Available from https://proceedings.mlr.press/v119/foster20a.html.

Related Material