skip to main content
10.5555/3600270.3600417guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

Maximizing and satisficing in multi-armed bandits with graph information

Published: 03 April 2024 Publication History

Abstract

Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision making and search under uncertainty. In modern applications however, one is often faced with a tremendously large number of options and even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similarity relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms is captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one that has sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms <b>GRUB</b> (GRaph based UcB) and ξ-<b>GRUB</b> for these problems and provide theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement that shows a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information.

Supplementary Material

Additional material (3600270.3600417_supp.pdf)
Supplemental material.

References

[1]
Yasin Abbasi-yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
[2]
Rie Kubota Ando and Tong Zhang. Learning on graph with laplacian regularization. Advances in neural information processing systems, 19:25, 2007.
[3]
Alexia Atsidakou, Orestis Papadigenopoulos, Constantine Caramanis, Sujay Sanghavi, and Sanjay Shakkottai. Asymptotically-optimal Gaussian bandits with side observations. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 1057-1077. PMLR, 17-23 Jul 2022.
[4]
Jean-Yves Audibert, Sébastien Bubeck, and Rémi Munos. Best arm identification in multi-armed bandits. In COLT, pages 41-53, 2010.
[5]
Ravindra B Bapat and Somit Gupta. Resistance distance in wheels and fans. Indian Journal of Pure and Applied Mathematics, 41(1): 1—13, 2010.
[6]
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory, pages 23-37. Springer, 2009.
[7]
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science, 412(19):1832-1852, 2011.
[8]
Wei Cao, Jian Li, Yufei Tao, and Zhize Li. On top-k selection in multi-armed bandits and hidden bipartite graphs. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
[9]
Shouyuan Chen, Tian Lin, Irwin King, Michael R Lyu, and Wei Chen. Combinatorial pure exploration of multi-armed bandits. In NIPS, pages 379-387, 2014.
[10]
Fan RK Chung and Fan Chung Graham. Spectral graph theory. Number 92. American Mathematical Soc., 1997.
[11]
G Dasarathy, N Rao, and R Baraniuk. On computational and statistical tradeoffs in matrix completion with graph information. In Signal Processing with Adaptive Sparse Structured Representations Workshop SPARS, 2017.
[12]
Gautam Dasarathy, Robert Nowak, and Xiaojin Zhu. S2: An efficient graph based active learning algorithm with application to nonparametric classification. In Conference on Learning Theory, pages 503-522. PMLR, 2015.
[13]
Eyal Even-Dar, Shie Mannor, Yishay Mansour, and Sridhar Mahadevan. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(6), 2006.
[14]
Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric. Best arm identification: A unified approach to fixed budget and fixed confidence. In NIPS-Twenty-Sixth Annual Conference on Neural Information Processing Systems, 2012.
[15]
Aurélien Garivier and Emilie Kaufmann. Non-asymptotic sequential tests for overlapping hypotheses and application to near optimal arm identification in bandit models, 2019.
[16]
Claudio Gentile, Shuai Li, and Giovanni Zappella. Online clustering of bandits, 2014.
[17]
Jiafeng Guo, Xueqi Cheng, Gu Xu, and Huawei Shen. A structured approach to query recommendation with social annotation data. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 619-628, 2010.
[18]
Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, and Osman Yagan. Multi-armed bandits with correlated arms, 2020.
[19]
Vassilis N. Ioannidis, Xiang Song, Saurav Manchanda, Mufei Li, Xiaoqin Pan, Da Zheng, Xia Ning, Xiangxiang Zeng, and George Karypis. Drkg - drug repurposing knowledge graph for covid-19. https://github.com/gnn4dr/DRKG/, 2020.
[20]
Mohsen Jamali and Martin Ester. Trustwalker: a random walk model for combining trust-based and item-based recommendation. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 397-406, 2009.
[21]
Kevin Jamieson and Robert Nowak. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 2014 48th Annual Conference on Information Sciences and Systems (CISS), pages 1-6. IEEE, 2014.
[22]
Ming Ji and Jiawei Han. A variance minimization criterion to active learning on graphs. In Neil D. Lawrence and Mark Girolami, editors, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of Proceedings of Machine Learning Research, pages 556-564, La Palma, Canary Islands, 21-23 Apr 2012. PMLR.
[23]
Gauri Joshi, Emina Soljanin, and Gregory Wornell. Efficient redundancy techniques for latency reduction in cloud systems, 2017.
[24]
Kirthevasan Kandasamy, Gautam Dasarathy, Barnabas Poczos, and Jeff Schneider. The multi- fidelity multi-armed bandit. In Advances in Neural Information Processing Systems, pages 1777-1785, 2016.
[25]
George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 20(1):359-392, 1998.
[26]
Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier. On the complexity of a/b testing, 2015.
[27]
Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier. On the complexity of best arm identification in multi-armed bandit models, 2016.
[28]
Douglas J Klein and Milan Randić. Resistance distance. Journal of mathematical chemistry, 12(1):81-95, 1993.
[29]
Toméš Kocák and Aurélien Garivier. Best arm identification in spectral bandits. arXiv preprint arXiv:2005.09841, 2020.
[30]
Ravi Kumar Kolla, Krishna Jagannathan, and Aditya Gopalan. Collaborative learning of stochastic bandits over a social network. IEEE/ACM Transactions on Networking, 26(4):1782-1795, 2018.
[31]
Dan Kushnir and Luca Venturi. Diffusion-based deep active learning. CoRR, abs/2003.10339, 2020.
[32]
Tor Lattimore and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
[33]
Daniel LeJeune, Gautam Dasarathy, and Richard Baraniuk. Thresholding graph bandits with grapl. In International Conference on Artificial Intelligence and Statistics, pages 2476-2485. PMLR, 2020.
[34]
Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[35]
Shuai Li, Alexandras Karatzoglou, and Claudio Gentile. Collaborative filtering bandits, 2016.
[36]
John Lipor and Gautam Dasarathy. Quantile search with time-varying search parameter. In 2018 52nd Asilomar Conference on Signals, Systems, and Computers, pages 1016-1018. IEEE, 2018.
[37]
Yifei Ma, Roman Garnett, and Jeff Schneider. σ-optimality for active learning on gaussian random fields. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
[38]
Yifei Ma, Roman Garnett, and Jeff G Schneider. σ-optimality for active learning on gaussian random fields. In NIPS, pages 2751-2759, 2013.
[39]
Yifei Ma, Tzu-Kuo Huang, and Jeff Schneider. Active search and bandits on graphs using sigma-optimality. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pages 542-551, 2015.
[40]
Yifei Ma, Tzu-Kuo Huang, and Jeff Schneider. Active search and bandits on graphs using sigma-optimality. UAI'15, page 542-551, Arlington, Virginia, USA, 2015. AUAI Press.
[41]
Shie Mannor and John N. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. 5:623-648, December 2004.
[42]
Kenneth Nordström. Convexity of the inverse and moore-penrose inverse. Linear Algebra and its Applications, 434(6):1489-1512, 2011.
[43]
Nikhil Rao, Hsiang-Fu Yu, Pradeep Ravikumar, and Inderjit S Dhillon. Collaborative filtering with graph information: Consistency and scalable methods. In NIPS, volume 2, page 7. Citeseer, 2015.
[44]
Herbert Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527-535, 1952.
[45]
Benedek Rozemberczki, Carl Allen, and Rik Sarkar. Multi-scale attributed node embedding, 2019.
[46]
Benedek Rozemberczki and Rik Sarkar. Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20), page 1325-1334. ACM, 2020.
[47]
Ohad Shamir. A variant of azuma's inequality for martingales with subgaussian tails, 2011.
[48]
Herbert A Simon. A behavioral model of rational choice. The quarterly journal of economics, 69(1):99-118, 1955.
[49]
Marta Soare, Alessandro Lazaric, and Rémi Munos. Best-arm identification in linear bandits, 2014.
[50]
Cem Tekin and Eralp Turğay. Multi-objective contextual multi-armed bandit with a dominant objective. IEEE Transactions on Signal Processing, 66(14):3799-3813, 2018.
[51]
Michal Valko, Rémi Munos, Branislav Kveton, and Tomáš Kocák. Spectral bandits for smooth graph functions. In International Conference on Machine Learning, pages 46-54. PMLR, 2014.
[52]
Dingyu Wang, John Lipor, and Gautam Dasarathy. Distance-penalized active learning via markov decision processes. In 2019 IEEE Data Science Workshop (DSW), pages 155-159. IEEE, 2019.
[53]
Liwei Wu, Hsiang-Fu Yu, Nikhil Rao, James Sharpnack, and Cho-Jui Hsieh. Graph dna: Deep neighborhood aware graph encoding for collaborative filtering. In International Conference on Artificial Intelligence and Statistics, pages 776-787. PMLR, 2020.
[54]
Yifan Wu, András György, and Csaba Szepesvári. Online learning with gaussian payoffs and side observations, 2015.
[55]
Wenjun Xiao and Ivan Gutman. Resistance distance and laplacian spectrum. Theoretical chemistry accounts, 110(4):284-289, 2003.
[56]
Kaige Yang, Xiaowen Dong, and Laura Toni. Laplacian-regularized graph bandits: Algorithms and theoretical analysis, 2020.
[57]
M Todd Young, Jacob Hinkle, Arvind Ramanathan, and Ramakrishnan Kannan. Hyperspace: Distributed bayesian hyperparameter optimization. In 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 339-347. IEEE, 2018.
[58]
Jifan Zhang, Julian Katz-Samuels, and Robert D. Nowak. GALAXY: graph-based active learning at the extreme. CoRR, abs/2202.01402, 2022.
[59]
Yuan Zhou, Xi Chen, and Jian Li. Optimal pac multiple arm identification with applications to crowdsourcing. In International Conference on Machine Learning, pages 217-225. PMLR, 2014.
[60]
Xiaojin Jerry Zhu. Semi-supervised learning literature survey. 2005.

Index Terms

  1. Maximizing and satisficing in multi-armed bandits with graph information
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems
          November 2022
          39114 pages

          Publisher

          Curran Associates Inc.

          Red Hook, NY, United States

          Publication History

          Published: 03 April 2024

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 12 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media