Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Chen, Haoyu; Lu, Wenbin; Song, Rui

doi:10.1080/01621459.2020.1770098

Statistics > Machine Learning

arXiv:2010.07283 (stat)

[Submitted on 14 Oct 2020]

Title:Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Authors:Haoyu Chen, Wenbin Lu, Rui Song

View PDF

Abstract:Online decision-making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The $\varepsilon$-greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the in-sample inverse propensity weighted value estimator is asymptotically normal. We illustrate our results using simulations and an application to a news article recommendation dataset from Yahoo!.

Comments:	Accepted by the Journal of the American Statistical Association
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2010.07283 [stat.ML]
	(or arXiv:2010.07283v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2010.07283
Related DOI:	https://doi.org/10.1080/01621459.2020.1770098

Submission history

From: Haoyu Chen [view email]
[v1] Wed, 14 Oct 2020 17:57:14 UTC (8,422 KB)

Statistics > Machine Learning

Title:Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators