skip to main content
10.1145/1143844.1143970acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Totally corrective boosting algorithms that maximize the margin

Published: 25 June 2006 Publication History

Abstract

We consider boosting algorithms that maintain a distribution over a set of examples. At each iteration a weak hypothesis is received and the distribution is updated. We motivate these updates as minimizing the relative entropy subject to linear constraints. For example AdaBoost constrains the edge of the last hypothesis w.r.t. the updated distribution to be at most γ = 0. In some sense, AdaBoost is "corrective" w.r.t. the last hypothesis. A cleaner boosting method is to be "totally corrective": the edges of all past hypotheses are constrained to be at most γ, where γ is suitably adapted.Using new techniques, we prove the same iteration bounds for the totally corrective algorithms as for their corrective versions. Moreover with adaptive γ, the algorithms provably maximizes the margin. Experimentally, the totally corrective versions return smaller convex combinations of weak hypotheses than the corrective ones and are competitive with LPBoost, a totally corrective boosting algorithm with no regularization, for which there is no iteration bound known.

References

[1]
Bennett, K., Demiriz, A., & Shawe-Taylor, J. (2000). A column generation algorithm for boosting. Proc. ICML (pp. 65--72). Morgan Kaufmann.
[2]
Bregman, L. (1967). The relaxation method for finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Math. and Math. Physics, 7, 200--127.
[3]
Breiman, L. (1997). Prediction games and arcing algorithmsTechnical Report 504). Statistics Department, University of California at Berkeley.
[4]
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11, 1493--1518.
[5]
Duffy, N., & Helmbold, D. (2000). Potential boosters? NIPS'00 (pp. 258--264).
[6]
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. of Comp. & Sys. Sci., 55, 119--139.
[7]
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive Logistic Regression: a statistical view of boosting. Annals of Statistics, 2, 337--374.
[8]
Grove, A., & Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. Proc. 15th Nat. Conf. on Art. Int.
[9]
Herbster, M., & Warmuth, M. (2001). Tracking the best linear prediction. J. Mach. Learn. Res., 281--309.
[10]
Kivinen, J., & Warmuth, M. (1999). Boosting as entropy projection. COLT'99.
[11]
Lafferty, J. (1999). Additive models, boosting, and inference for generalized divergences. COLT'99 (pp. 125--133).
[12]
Littlestone, N. (1988). Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285--318.
[13]
Long, P. M., & Wu, X. (2005). Mistake bounds for maximum entropy discrimination. NIPS'04 (pp. 833--840).
[14]
Nocedal, J., & Wright, S. (2000). Numerical optimization. Springer Series in Op. Res. Springer.
[15]
Rätsch, G., Onoda, T., & Müüller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42, 287--320.
[16]
Rätsch, G., & Warmuth, M. K. (2005). Efficient margin maximizing with boosting. J. Mach. Learn. Res., 2131--2152.
[17]
Rudin, C., Daubechies, I., & Schapire, R. (2004a). Dynamics of AdaBoost: Cyclic behavior and convergence of margins. J. Mach. Learn. Res., 1557--1595.
[18]
Rudin, C., Schapire, R., & Daubechies, I. (2004b). Analysis of boosting algoritms using the smooth margin function: A study of three algorithms. Unpublished manuscript.
[19]
Schapire, R., Freund, Y., Bartlett, P., & Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26, 1651--1686.
[20]
Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297--336.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;
Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media