PAC-Bayesian Offline Contextual Bandits With Guarantees

Sakhi, Otmane; Alquier, Pierre; Chopin, Nicolas

Statistics > Machine Learning

arXiv:2210.13132 (stat)

[Submitted on 24 Oct 2022 (v1), last revised 27 May 2023 (this version, v2)]

Title:PAC-Bayesian Offline Contextual Bandits With Guarantees

Authors:Otmane Sakhi, Pierre Alquier, Nicolas Chopin

View PDF

Abstract:This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to optimize them. We prove that the derived bounds are tighter than their competitors, and can be optimized directly to confidently improve upon the logging policy offline. Our approach learns policies with guarantees, uses all available data and does not require tuning additional hyperparameters on held-out sets. We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios.

Comments:	Accepted to ICML 2023
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2210.13132 [stat.ML]
	(or arXiv:2210.13132v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2210.13132

Submission history

From: Otmane Sakhi [view email]
[v1] Mon, 24 Oct 2022 11:38:34 UTC (135 KB)
[v2] Sat, 27 May 2023 07:30:17 UTC (165 KB)

Statistics > Machine Learning

Title:PAC-Bayesian Offline Contextual Bandits With Guarantees

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:PAC-Bayesian Offline Contextual Bandits With Guarantees

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators