research-article

Efficient bandit algorithms for online multiclass prediction

Authors:

Sham M. Kakade,

Shai Shalev-Shwartz,

Ambuj TewariAuthors Info & Claims

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 440 - 447

https://doi.org/10.1145/1390156.1390212

Published: 05 July 2008 Publication History

Abstract

This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multiclass bandit setting. The multiclass bandit setting models a wide range of practical supervised learning applications where the learner only receives partial feedback (referred to as "bandit" feedback, in the spirit of multi-armed bandit models) with respect to the true label (e.g. in many web applications users often only provide positive "click" feedback which does not necessarily fully disclose a true label). The Banditron has the ability to learn in a multiclass classification setting with the "bandit" feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label). We provide (relative) mistake bounds which show how the Banditron enjoys favorable performance, and our experiments demonstrate the practicality of the algorithm. Furthermore, this paper pays close attention to the important special case when the data is linearly separable --- a problem which has been exhaustively studied in the full information setting yet is novel in the bandit setting.

References

[1]

R. I. Arriaga and S. Vempala. An algorithmic theory of learning: Robust concepts and random projection. Mach. Learn., 63(2), 2006.

Digital Library

[2]

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th Annual FOCS, 1998.

Digital Library

[3]

K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951--991, 2003.

Digital Library

[4]

K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive aggressive algorithms. Journal of Machine Learning Research, 7:551--585, Mar 2006.

Digital Library

[5]

R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.

Digital Library

[6]

A. Elisseeff and J. Weston. A kernel method for multi-labeled classification. In Advances in Neural Information Processing Systems 14, 2001.

[7]

M. Fink, S. Shalev-Shwartz, Y. Singer, and S. Ullman. Online multiclass learning by interclass hypothesis sharing. In Proceedings of the 23rd International Conference on Machine Learning, 2006.

Digital Library

[8]

A. Flaxman, A. Kalai, and H. B. McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 385--394, 2005.

Digital Library

[9]

Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37 (3):277--296, 1999.

Digital Library

[10]

J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--64, January 1997.

Digital Library

[11]

R. D. Kleinberg. Nearly tight bounds for the continuumarmed bandit problem. NIPS, 2004.

[12]

J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. NIPS, 2007.

Digital Library

[13]

N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285--318, 1988.

[14]

F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386--407, 1958. (Reprinted in Neurocomputing (MIT Press, 1988).).

Digital Library

[15]

S. Shalev-Shwartz and Y. Singer. A primal-dual perspective of online learning algorithms. Machine Learning Journal, 2007.

Digital Library

[16]

V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.

Digital Library

[17]

J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artificial Neural Networks, April 1999.

Cited By

Wang ZQiao XSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Efficient online set-valued classification with bandit feedbackProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694175(51328-51347)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694175
Pasteris SHicks CMavroudis VOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Nearest neighbour with bandit feedbackProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667014(20320-20351)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667014
Brereton MAmbe ALovell DSitbon LCapel TSoro AXu YMoreira CFavre BBradley A(2023)Designing Interaction with AI for Human Learning: Towards Human-Machine Teaming in Radiology TrainingProceedings of the 35th Australian Computer-Human Interaction Conference10.1145/3638380.3638435(639-647)Online publication date: 2-Dec-2023
https://dl.acm.org/doi/10.1145/3638380.3638435
Show More Cited By

Index Terms

Efficient bandit algorithms for online multiclass prediction
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
  2. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
      2. Modeling methodologies

Recommendations

Algorithms for bandit online linear optimization
Multiclass Classification Using Dilute Bandit Feedback
PRICAI 2021: Trends in Artificial Intelligence
Abstract
This paper introduces a new online learning framework for multiclass classification called learning with diluted bandit feedback. At every time step, the algorithm predicts a candidate label set instead of a single label for the observed example. ... $^{\frac{}{}}$
Multiclass classification with bandit feedback using adaptive regularization
ICML'11: Proceedings of the 28th International Conference on International Conference on Machine Learning

We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit of right-or-wrong, rather then the true label. Our algorithm is based on the 2nd-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '08: Proceedings of the 25th international conference on Machine learning

July 2008

1310 pages

ISBN:9781605582054

DOI:10.1145/1390156

General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Pascal
University of Helsinki
Xerox
Federation of Finnish Learned Societies
Google Inc.
NSF
Machine Learning Journal/Springer
Microsoft Research: Microsoft Research
Intel: Intel
Yahoo!
Helsinki Institute for Information Technology
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '08

Sponsor:

Microsoft Research
Intel
IBM

ICML '08: The 25th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

July 5 - 9, 2008

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

66
Total Citations
View Citations
673
Total Downloads

Downloads (Last 12 months)74
Downloads (Last 6 weeks)4

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZQiao XSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Efficient online set-valued classification with bandit feedbackProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694175(51328-51347)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694175
Pasteris SHicks CMavroudis VOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Nearest neighbour with bandit feedbackProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667014(20320-20351)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667014
Brereton MAmbe ALovell DSitbon LCapel TSoro AXu YMoreira CFavre BBradley A(2023)Designing Interaction with AI for Human Learning: Towards Human-Machine Teaming in Radiology TrainingProceedings of the 35th Australian Computer-Human Interaction Conference10.1145/3638380.3638435(639-647)Online publication date: 2-Dec-2023
https://dl.acm.org/doi/10.1145/3638380.3638435
Gu SLuo THe MHou C(2023)Online Learning With Incremental Feature Space and Bandit FeedbackIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327231335:12(12902-12916)Online publication date: 1-Dec-2023
https://doi.org/10.1109/TKDE.2023.3272313
Manwani NAgarwal M(2023)Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191245(1-10)Online publication date: 18-Jun-2023
https://doi.org/10.1109/IJCNN54540.2023.10191245
Feng WShi HZhao PGao X(2023)Mixtron: Bandit Online Multiclass Prediction with Implicit Feedback2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00115(1004-1012)Online publication date: 1-Dec-2023
https://doi.org/10.1109/ICDM58522.2023.00115
Qian WIng CLiu J(2023)Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional CovariatesJournal of the American Statistical Association10.1080/01621459.2022.2152343119:546(970-982)Online publication date: 11-Jan-2023
https://doi.org/10.1080/01621459.2022.2152343
Kang ZNielsen MYang BDeng LLorenzen S(2023)Online transfer learning with partial feedbackExpert Systems with Applications10.1016/j.eswa.2022.118738212(118738)Online publication date: Feb-2023
https://doi.org/10.1016/j.eswa.2022.118738
Agarwal MManwani N(2022)ALBIF: Active Learning with BandIt FeedbacksAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-05981-0_28(353-364)Online publication date: 10-May-2022
https://doi.org/10.1007/978-3-031-05981-0_28
Guruganesh GLiu ASchneider JWang JRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Margin-independent online multiclass learning via convex geometryProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542494(29156-29167)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542494
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents