skip to main content
10.5555/2999325.2999432guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

A stochastic gradient method with an exponential convergence rate for finite training sets

Published: 03 December 2012 Publication History

Abstract

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly.

References

[1]
H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22(3):400-407, 1951.
[2]
L. Bottou and Y. LeCun. Large scale online learning. NIPS, 2003.
[3]
C. H. Teo, Q. Le, A. J. Smola, and S. V. N. Vishwanathan. A scalable modular convex solver for regularized risk minimization. KDD, 2007.
[4]
M. A. Cauchy. Méthode générale pour la résolution des systèmes d'équations simultanées. Comptes rendus des séances de l'Académie des sciences de Paris, 25:536-538, 1847.
[5]
Y. Nesterov. Introductory lectures on convex optimization: A basic course. Springer, 2004.
[6]
A. Nemirovski and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley, 1983.
[7]
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574-1609, 2009.
[8]
A. Agarwal, P. L. Bartlett, P. Ravikumar, and M. J. Wainwright. Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Transactions on Information Theory, 58(5), 2012.
[9]
D. Blatt, A. O. Hero, and H. Gauchman. A convergent incremental gradient method with a constant step size. SIAM Journal on Optimization, 18(1):29-51, 2007.
[10]
P. Tseng. An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization, 8(2):506-531, 1998.
[11]
Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical programming, 120(1):221-259, 2009.
[12]
L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11:2543-2596, 2010.
[13]
B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838-855, 1992.
[14]
H. J. Kushner and G. Yin. Stochastic approximation and recursive algorithms and applications. Springer-Verlag, Second edition, 2003.
[15]
E. Hazan and S. Kale. Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. COLT, 2011.
[16]
Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O(1/k2). Doklady AN SSSR, 269(3):543-547, 1983.
[17]
N.N. Schraudolph. Local gain adaptation in stochastic gradient descent. ICANN, 1999.
[18]
P. Sunehag, J. Trumpf, SVN Vishwanathan, and N. Schraudolph. Variable metric stochastic approximation theory. International Conference on Artificial Intelligence and Statistics, 2009.
[19]
S. Ghadimi and G. Lan. Optimal stochastic' approximation algorithms for strongly convex stochastic composite optimization. Optimization Online, July, 2010.
[20]
J. Martens. Deep learning via Hessian-free optimization. ICML, 2010.
[21]
A. Nedic and D. Bertsekas. Convergence rate of incremental subgradient algorithms. In Stochastic Optimization: Algorithms and Applications, pages 263-304. Kluwer Academic, 2000.
[22]
M.V. Solodov. Incremental gradient algorithms with stepsizes bounded away from zero. Computational Optimization and Applications, 11(1):23-35, 1998.
[23]
H. Kesten. Accelerated stochastic approximation. Annals of Mathematical Statistics, 29(1):41-59, 1958.
[24]
B. Delyon and A. Juditsky. Accelerated stochastic approximation. SIAM Journal on Optimization, 3(4):868-881, 1993.
[25]
D. P. Bertsekas. A new class of incremental gradient methods for least squares problems. SIAM Journal on Optimization, 7(4):913-926, 1997.
[26]
M. P. Friedlander and M. Schmidt. Hybrid deterministic-stochastic methods for data fitting. SIAM Journal of Scientific Computing, 34(3):A1351-A1379, 2012.
[27]
J. Liu, J. Chen, and J. Ye. Large-scale sparse logistic regression. KDD, 2009.
[28]
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. ICML, 2007.
[29]
P. Liang, F. Bach, and M. I. Jordan. Asymptotically optimal regularization in smooth parametric models. NIPS, 2009.
[30]
K. Sridharan, S. Shalev-Shwartz, and N. Srebro. Fast rates for regularized objectives. NIPS, 2008.
[31]
M. Eberts and I. Steinwart. Optimal learning rates for least squares SVMs using Gaussian kernels. NIPS, 2011.
[32]
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. NIPS, 2007.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'12: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2
December 2012
3328 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2012

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media