skip to main content
10.5555/2997189.2997322guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Self-paced learning for latent variable models

Published: 06 December 2010 Publication History

Abstract

Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the self-paced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif finding and handwritten digit recognition.

References

[1]
E. Allgower and K. Georg. Numerical continuation methods: An introduction. Springer-Verlag, 1990.
[2]
M. Bazaraa, H. Sherali, and C. Shetty. Nonlinear Programming - Theory and Algorithms. John Wiley and Sons, Inc., 1993.
[3]
Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009.
[4]
M. Berger, G. Badis, A. Gehrke, and S. Talukder et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell, 27, 2008.
[5]
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT, 98.
[6]
D. Cohn, Z. Ghahramani, and M. Jordan. Active learning with statistical models. JAIR, 4:129-145, 1996.
[7]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.
[8]
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, 39(1):1-38, 1977.
[9]
P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008.
[10]
T. Finley and T. Joachims. Supervised clustering with support vector machines. In ICML, 2005.
[11]
C. Floudas and V. Visweswaran. Primal-relaxed dual global optimization approach. Journal of Optimization Theory and Applications, 78(2):187-225, 1993.
[12]
A. German, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis. Chapman and Hall, 1995.
[13]
G. Heitz, G. Elidan, B. Packer, and D. Koller. Shape-based object localization for descriptive classification. IJCV, 2009.
[14]
T. Joachims, T. Finley, and C.-N. Yu. Cutting-plane training for structural SVMs. Machine Learning, 77(1):27-59, 2009.
[15]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
[16]
V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. In ACL, 2002.
[17]
K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In CIKM, 2000.
[18]
P. Simard, B. Victorri, Y. LeCun, and J. Denker. Tangent Prop - a formalism for specifying selected invariances in adaptive network. In NIPS, 1991.
[19]
B. Sriperumbudur and G. Lanckriet. On the convergence of concave-convex procedure. In NIPS Workshop on Optimization for Machine Learning, 2009.
[20]
B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NIPS, 2003.
[21]
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. JMLR, 2:45-66, 2001.
[22]
I. Tsochantaridis, T. Hofmann, Y. Altun, and T. Joachims. Support vector machine learning for interdependent and structured output spaces. In ICML, 2004.
[23]
C.-N. Yu and T. Joachims. Learning structural SVMs with latent variables. In ICML, 2009.
[24]
A. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15, 2003.
[25]
K. Zhang, I. Tsang, and J. Kwok. Maximum margin clustering made practical. In ICML, 2007.

Cited By

View all

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'10: Proceedings of the 24th International Conference on Neural Information Processing Systems - Volume 1
December 2010
2630 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 06 December 2010

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media