skip to main content
research-article

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

Published: 01 October 2021 Publication History

Abstract

Combinatorial optimization in the face of uncertainty is a challenge in both operational research and machine learning. In this paper, we consider a special and important class called the adversarial online combinatorial optimization with semi-bandit feedback, in which a player makes combinatorial decisions and gets the corresponding feedback repeatedly. While existing algorithms focus on the regret guarantee or assume there exists an efficient offline oracle, it is still a challenge to solve this problem efficiently if the offline counterpart is NP-hard. In this paper, we propose a variant of the Follow-the-Perturbed-Leader (FPL) algorithm to solve this problem. Unlike the existing FPL approach, our method employs an approximation algorithm as an offline oracle and perturbs the collected data by adding nonnegative random variables. Our approach is simple and computationally efficient. Moreover, it can guarantee a sublinear (1 + ε)-scaled regret of order O(T23) for any small ε v> 0 for an important class of combinatorial optimization problems that admit an FPTAS (fully polynomial time approximation scheme), in which T is the number of rounds of the learning process. In addition to the theoretical analysis, we also conduct a series of experiments to demonstrate the performance of our algorithm.

References

[1]
György A, Linder T, Lugosi G, and Ottucsák G The on-line shortest path problem under partial monitoring Journal of Machine Learning Research 2007 8 10 2369-2403
[2]
Audibert J Y, Bubeck S, and Lugosi G Regret in online combinatorial optimization Mathematics of Operations Research 2013 39 1 31-45
[3]
Neu G, Bartók G. An efficient algorithm for learning with semi-bandit feedback. In: Prodeedings of International Conference on Algorithmic Learning Theory. 2013, 234–248
[4]
Neu G and Bartók G Importance weighting without importance weights: an efficient algorithm for combinatorial semi-bandits Journal of Machine Learning Research 2016 17 154 1-21
[5]
Kalai A and Vempala S Efficient algorithms for online decision problems Journal of Computer and System Sciences 2005 71 3 297-307
[6]
Wang W and Zhou Z H Crowdsourcing label quality: a theoretical analysis Science China Information Sciences 2015 58 11 1-12
[7]
Thompson W On the likelihood that one unknown probability exceeds another in view of the evidence of two samples Biometrika 1933 25 3/4 285-294
[8]
Robbins H Some aspects of the sequential design of experiments Bulletin of the America Mathematical Society 1952 58 5 527-535
[9]
Auer P, Cesa-Bianchi N, and Fischer P Finite-time analysis of the multi-armed bandit problem Machine Learning 2002 47 2–3 235-256
[10]
Auer P, Cesa-Bianchi N, Freund Y, and Schapire R The nonstochastic multiarmed bandit problem SIAM Journal on Computing 2002 32 1 48-77
[11]
Gai Y, Krishnamachari B, and Jain R Combinatorial network optimization with unknown variables: multi-armed bandits with linear rewards and individual observations IEEE/ACM Transactions on Networking 2012 20 5 1466-1478
[12]
Chen W, Wang Y, Yuan Y, and Wang Q Combinatorial multi-armed bandit and its extension to probabilistically triggered arms Journal of Machine Learning Research 2016 17 50 1-33
[13]
Kveton B, Wen Z, Ashkan A, Szepesvari C. Tight regret bounds for stochastic combinatorial semi-bandits. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2015, 535–543
[14]
Sankararaman K A, Slivkins A. Combinatorial semi-bandits with knapsacks. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2018, 1760–1770
[15]
Kveton B, Wen Z, Ashkan A, Eydgahi H, Eriksson B. Matroid bandits: fast combinatorial optimization with learning. In: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence. 2014, 420–429
[16]
Takimoto E and Warmuth M Path kernels and multiplicative updates Journal of Machine Learning Research 2003 4 5 773-818
[17]
Awerbuch B, Kleinberg R. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 2004, 45–53
[18]
McMahan H B, Blum A. Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of International Conference on Computational Learning Theory. 2004, 109–123
[19]
Dani V, Kakade S, Hayes T. The price of bandit information for online optimization. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. 2007, 345–352
[20]
Abernethy J, Hazan E, Rakhlin A. Competing in the dark: an efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory. 2008, 263–273
[21]
Cesa-Bianchi N and Lugosi G Combinatorial bandits Journal of Computer and System Sciences 2012 78 5 1404-1422
[22]
Bubeck S, Cesa-Bianchi N, Kakade S. Towards minimax policies for online linear optimization with bandit feedback. In: Proceedings of Annual Conference on Learning Theory. 2012
[23]
Combes R, Talebi M, Proutiere A, Lelarge M. Combinatorial bandits revisited. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems. 2015, 2116–2124
[24]
Blum A Fiat A and Woeginger G L On-line algorithms in machine learning Online Algorithms 1998 Berlin Springer 306-325
[25]
Foster D and Vohra R Regret in the on-line decision problem Games and Economic Behavior 1999 29 7-35
[26]
Cesa-Bianchi N and Lugosi G Prediction, Learning, and Games 2006 New York Cambridge University Press
[27]
Hannan J Approximation to bayes risk in repeated play Contributions to the Theory of Games 1957 3 97-139
[28]
Vazirani V Approximation Algorithms 2001 Berlin Springer
[29]
Williamson D and Shmoys A The Design of Approximation Algorithms 2011 New York Cambridge University Press
[30]
Poland J. FPL analysis for adaptive bandits. In: Proceedings of International Symposium on Stochastic Algorithms. 2005, 58–69
[31]
Kakade S, Kalai A, and Ligett K Playing games with approximation algorithms SIAM Journal on Computing 2009 39 3 1088-1106
[32]
Garber D. Efficient online linear optimization with approximation algorithms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 627–635
[33]
Hazan E, Hu W, Li Y, Li Z. Online improper learning with an approximation oracle. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 5652–5660

Index Terms

  1. Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Frontiers of Computer Science: Selected Publications from Chinese Universities
        Frontiers of Computer Science: Selected Publications from Chinese Universities  Volume 15, Issue 5
        Oct 2021
        187 pages
        ISSN:2095-2228
        EISSN:2095-2236
        Issue’s Table of Contents

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 01 October 2021
        Accepted: 27 July 2020
        Received: 25 December 2019

        Author Tags

        1. online learning
        2. online combinatorial optimization
        3. semi-bandit
        4. follow-the-perturbed-leader

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 22 Dec 2024

        Other Metrics

        Citations

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media