My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

Ilai Bistritz, Tavor Baharav, Amir Leshem, Nicholas Bambos
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:930-940, 2020.

Abstract

Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players receive noisy observations of their utility for their selected arm. However, if any other players selected the same arm that turn, they will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a \log\log T factor. This is the first max-min fairness multi-player bandit algorithm with (near) order optimal regret.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-bistritz20a, title = {My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits}, author = {Bistritz, Ilai and Baharav, Tavor and Leshem, Amir and Bambos, Nicholas}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {930--940}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/bistritz20a/bistritz20a.pdf}, url = {https://proceedings.mlr.press/v119/bistritz20a.html}, abstract = {Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players receive noisy observations of their utility for their selected arm. However, if any other players selected the same arm that turn, they will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a \log\log T factor. This is the first max-min fairness multi-player bandit algorithm with (near) order optimal regret.} }
Endnote
%0 Conference Paper %T My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits %A Ilai Bistritz %A Tavor Baharav %A Amir Leshem %A Nicholas Bambos %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-bistritz20a %I PMLR %P 930--940 %U https://proceedings.mlr.press/v119/bistritz20a.html %V 119 %X Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players receive noisy observations of their utility for their selected arm. However, if any other players selected the same arm that turn, they will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a \log\log T factor. This is the first max-min fairness multi-player bandit algorithm with (near) order optimal regret.
APA
Bistritz, I., Baharav, T., Leshem, A. & Bambos, N.. (2020). My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:930-940 Available from https://proceedings.mlr.press/v119/bistritz20a.html.

Related Material