Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

Romano, Giulia; Agostini, Andrea; Trovò, Francesco; Gatti, Nicola; Restelli, Marcello

Computer Science > Machine Learning

arXiv:2206.00586 (cs)

[Submitted on 1 Jun 2022]

Title:Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

Authors:Giulia Romano, Andrea Agostini, Francesco Trovò, Nicola Gatti, Marcello Restelli

View PDF

Abstract:There is a rising interest in industrial online applications where data becomes available sequentially. Inspired by the recommendation of playlists to users where their preferences can be collected during the listening of the entire playlist, we study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB), in which the stochastic reward associated with the pull of an arm is partitioned over a finite number of consecutive rounds following the pull. This setting, unexplored so far to the best of our knowledge, is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull instead of being fully disclosed in a single, potentially delayed round. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW, which exploit the partial information disclosed by the reward collected over time. We show that our algorithms provide better asymptotical regret upper bounds than delayed-feedback bandit algorithms when a property characterizing a broad set of reward structures of practical interest, namely alpha-smoothness, holds. We also empirically evaluate their performance across a wide range of settings, both synthetically generated and from a real-world media recommendation problem.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2206.00586 [cs.LG]
	(or arXiv:2206.00586v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2206.00586

Submission history

From: Francesco Trovó Dr. [view email]
[v1] Wed, 1 Jun 2022 15:56:59 UTC (2,378 KB)

Computer Science > Machine Learning

Title:Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators