research-article

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

Authors:

Xiaoming SunAuthors Info & Claims

Frontiers of Computer Science, Volume 15, Issue 5

https://doi.org/10.1007/s11704-020-9519-9

Published: 01 October 2021 Publication History

Abstract

Combinatorial optimization in the face of uncertainty is a challenge in both operational research and machine learning. In this paper, we consider a special and important class called the adversarial online combinatorial optimization with semi-bandit feedback, in which a player makes combinatorial decisions and gets the corresponding feedback repeatedly. While existing algorithms focus on the regret guarantee or assume there exists an efficient offline oracle, it is still a challenge to solve this problem efficiently if the offline counterpart is NP-hard. In this paper, we propose a variant of the Follow-the-Perturbed-Leader (FPL) algorithm to solve this problem. Unlike the existing FPL approach, our method employs an approximation algorithm as an offline oracle and perturbs the collected data by adding nonnegative random variables. Our approach is simple and computationally efficient. Moreover, it can guarantee a sublinear (1 + ε)-scaled regret of order

O (T^{\frac{2}{3}})

for any small ε v> 0 for an important class of combinatorial optimization problems that admit an FPTAS (fully polynomial time approximation scheme), in which T is the number of rounds of the learning process. In addition to the theoretical analysis, we also conduct a series of experiments to demonstrate the performance of our algorithm.

References

[1]

György A, Linder T, Lugosi G, and Ottucsák G The on-line shortest path problem under partial monitoring Journal of Machine Learning Research 2007 8 10 2369-2403

[2]

Audibert J Y, Bubeck S, and Lugosi G Regret in online combinatorial optimization Mathematics of Operations Research 2013 39 1 31-45

[3]

Neu G, Bartók G. An efficient algorithm for learning with semi-bandit feedback. In: Prodeedings of International Conference on Algorithmic Learning Theory. 2013, 234–248

[4]

Neu G and Bartók G Importance weighting without importance weights: an efficient algorithm for combinatorial semi-bandits Journal of Machine Learning Research 2016 17 154 1-21

[5]

Kalai A and Vempala S Efficient algorithms for online decision problems Journal of Computer and System Sciences 2005 71 3 297-307

[6]

Wang W and Zhou Z H Crowdsourcing label quality: a theoretical analysis Science China Information Sciences 2015 58 11 1-12

[7]

Thompson W On the likelihood that one unknown probability exceeds another in view of the evidence of two samples Biometrika 1933 25 3/4 285-294

[8]

Robbins H Some aspects of the sequential design of experiments Bulletin of the America Mathematical Society 1952 58 5 527-535

[9]

Auer P, Cesa-Bianchi N, and Fischer P Finite-time analysis of the multi-armed bandit problem Machine Learning 2002 47 2–3 235-256

[10]

Auer P, Cesa-Bianchi N, Freund Y, and Schapire R The nonstochastic multiarmed bandit problem SIAM Journal on Computing 2002 32 1 48-77

[11]

Gai Y, Krishnamachari B, and Jain R Combinatorial network optimization with unknown variables: multi-armed bandits with linear rewards and individual observations IEEE/ACM Transactions on Networking 2012 20 5 1466-1478

[12]

Chen W, Wang Y, Yuan Y, and Wang Q Combinatorial multi-armed bandit and its extension to probabilistically triggered arms Journal of Machine Learning Research 2016 17 50 1-33

[13]

Kveton B, Wen Z, Ashkan A, Szepesvari C. Tight regret bounds for stochastic combinatorial semi-bandits. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2015, 535–543

[14]

Sankararaman K A, Slivkins A. Combinatorial semi-bandits with knapsacks. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2018, 1760–1770

[15]

Kveton B, Wen Z, Ashkan A, Eydgahi H, Eriksson B. Matroid bandits: fast combinatorial optimization with learning. In: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence. 2014, 420–429

[16]

Takimoto E and Warmuth M Path kernels and multiplicative updates Journal of Machine Learning Research 2003 4 5 773-818

[17]

Awerbuch B, Kleinberg R. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 2004, 45–53

[18]

McMahan H B, Blum A. Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of International Conference on Computational Learning Theory. 2004, 109–123

[19]

Dani V, Kakade S, Hayes T. The price of bandit information for online optimization. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. 2007, 345–352

[20]

Abernethy J, Hazan E, Rakhlin A. Competing in the dark: an efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory. 2008, 263–273

[21]

Cesa-Bianchi N and Lugosi G Combinatorial bandits Journal of Computer and System Sciences 2012 78 5 1404-1422

[22]

Bubeck S, Cesa-Bianchi N, Kakade S. Towards minimax policies for online linear optimization with bandit feedback. In: Proceedings of Annual Conference on Learning Theory. 2012

[23]

Combes R, Talebi M, Proutiere A, Lelarge M. Combinatorial bandits revisited. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems. 2015, 2116–2124

[24]

Blum A Fiat A and Woeginger G L On-line algorithms in machine learning Online Algorithms 1998 Berlin Springer 306-325

[25]

Foster D and Vohra R Regret in the on-line decision problem Games and Economic Behavior 1999 29 7-35

[26]

Cesa-Bianchi N and Lugosi G Prediction, Learning, and Games 2006 New York Cambridge University Press

[27]

Hannan J Approximation to bayes risk in repeated play Contributions to the Theory of Games 1957 3 97-139

[28]

Vazirani V Approximation Algorithms 2001 Berlin Springer

[29]

Williamson D and Shmoys A The Design of Approximation Algorithms 2011 New York Cambridge University Press

[30]

Poland J. FPL analysis for adaptive bandits. In: Proceedings of International Symposium on Stochastic Algorithms. 2005, 58–69

[31]

Kakade S, Kalai A, and Ligett K Playing games with approximation algorithms SIAM Journal on Computing 2009 39 3 1088-1106

[32]

Garber D. Efficient online linear optimization with approximation algorithms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 627–635

[33]

Hazan E, Hu W, Li Y, Li Z. Online improper learning with an approximation oracle. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 5652–5660

Index Terms

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization
1. Mathematics of computing
  1. Discrete mathematics
  2. Mathematical analysis
    1. Mathematical optimization
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization

Index terms have been assigned to the content through auto-classification.

Recommendations

Follow-the-perturbed-leader for adversarial Markov decision processes with bandit feedback
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

We consider regret minimization for Adversarial Markov Decision Processes (AMDPs), where the loss functions are changing over time and adversarially chosen, and the learner only observes the losses for the visited state-action pairs (i.e., bandit ...
Multi-armed bandit problem with known trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different ...
Online Learning over a Finite Action Set with Limited Switching
This paper studies the value of switching actions in the Prediction From Experts problem (PFE) and Adversarial Multiarmed Bandits problem (MAB). First, we revisit the well-studied and practically motivated setting of PFE with switching costs. Many ...

Comments

Information & Contributors

Information

Published In

cover image Frontiers of Computer Science: Selected Publications from Chinese Universities

Frontiers of Computer Science: Selected Publications from Chinese Universities Volume 15, Issue 5

Oct 2021

187 pages

ISSN:2095-2228

EISSN:2095-2236

Issue’s Table of Contents

© Higher Education Press 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 October 2021

Accepted: 27 July 2020

Received: 25 December 2019

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents