Article

A stochastic gradient method with an exponential convergence rate for finite training sets

Authors: Nicolas Le Roux, Mark Schmidt, Francis BachAuthors Info & Claims

NIPS'12: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2

Pages 2663 - 2671

Published: 03 December 2012 Publication History

Abstract

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly.

References

[1]

H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22(3):400-407, 1951.

[2]

L. Bottou and Y. LeCun. Large scale online learning. NIPS, 2003.

Digital Library

[3]

C. H. Teo, Q. Le, A. J. Smola, and S. V. N. Vishwanathan. A scalable modular convex solver for regularized risk minimization. KDD, 2007.

Digital Library

[4]

M. A. Cauchy. Méthode générale pour la résolution des systèmes d'équations simultanées. Comptes rendus des séances de l'Académie des sciences de Paris, 25:536-538, 1847.

[5]

Y. Nesterov. Introductory lectures on convex optimization: A basic course. Springer, 2004.

Digital Library

[6]

A. Nemirovski and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley, 1983.

[7]

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574-1609, 2009.

Digital Library

[8]

A. Agarwal, P. L. Bartlett, P. Ravikumar, and M. J. Wainwright. Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Transactions on Information Theory, 58(5), 2012.

Digital Library

[9]

D. Blatt, A. O. Hero, and H. Gauchman. A convergent incremental gradient method with a constant step size. SIAM Journal on Optimization, 18(1):29-51, 2007.

Digital Library

[10]

P. Tseng. An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization, 8(2):506-531, 1998.

Digital Library

[11]

Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical programming, 120(1):221-259, 2009.

Digital Library

[12]

L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11:2543-2596, 2010.

Digital Library

[13]

B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838-855, 1992.

Digital Library

[14]

H. J. Kushner and G. Yin. Stochastic approximation and recursive algorithms and applications. Springer-Verlag, Second edition, 2003.

[15]

E. Hazan and S. Kale. Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. COLT, 2011.

[16]

Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O(1/k²). Doklady AN SSSR, 269(3):543-547, 1983.

[17]

N.N. Schraudolph. Local gain adaptation in stochastic gradient descent. ICANN, 1999.

[18]

P. Sunehag, J. Trumpf, SVN Vishwanathan, and N. Schraudolph. Variable metric stochastic approximation theory. International Conference on Artificial Intelligence and Statistics, 2009.

[19]

S. Ghadimi and G. Lan. Optimal stochastic' approximation algorithms for strongly convex stochastic composite optimization. Optimization Online, July, 2010.

[20]

J. Martens. Deep learning via Hessian-free optimization. ICML, 2010.

Digital Library

[21]

A. Nedic and D. Bertsekas. Convergence rate of incremental subgradient algorithms. In Stochastic Optimization: Algorithms and Applications, pages 263-304. Kluwer Academic, 2000.

[22]

M.V. Solodov. Incremental gradient algorithms with stepsizes bounded away from zero. Computational Optimization and Applications, 11(1):23-35, 1998.

Digital Library

[23]

H. Kesten. Accelerated stochastic approximation. Annals of Mathematical Statistics, 29(1):41-59, 1958.

[24]

B. Delyon and A. Juditsky. Accelerated stochastic approximation. SIAM Journal on Optimization, 3(4):868-881, 1993.

Digital Library

[25]

D. P. Bertsekas. A new class of incremental gradient methods for least squares problems. SIAM Journal on Optimization, 7(4):913-926, 1997.

Digital Library

[26]

M. P. Friedlander and M. Schmidt. Hybrid deterministic-stochastic methods for data fitting. SIAM Journal of Scientific Computing, 34(3):A1351-A1379, 2012.

Digital Library

[27]

J. Liu, J. Chen, and J. Ye. Large-scale sparse logistic regression. KDD, 2009.

Digital Library

[28]

S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. ICML, 2007.

Digital Library

[29]

P. Liang, F. Bach, and M. I. Jordan. Asymptotically optimal regularization in smooth parametric models. NIPS, 2009.

Digital Library

[30]

K. Sridharan, S. Shalev-Shwartz, and N. Srebro. Fast rates for regularized objectives. NIPS, 2008.

Digital Library

[31]

M. Eberts and I. Steinwart. Optimal learning rates for least squares SVMs using Gaussian kernels. NIPS, 2011.

[32]

L. Bottou and O. Bousquet. The tradeoffs of large scale learning. NIPS, 2007.

Digital Library

Cited By

Bai YLiu YLuo LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)On the complexity of finite-sum smooth optimization under the Polyak—Łojasiewicz conditionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692165(2392-2417)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692165
Nguyen LTran TOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)On the convergence to a global solution of shuffling-type gradient algorithmsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669424(75545-75566)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669424
Liu ZZhang XLiu JChong SJoo CLee K(2022)SYNTHESISProceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3492866.3549722(151-160)Online publication date: 3-Oct-2022
https://dl.acm.org/doi/10.1145/3492866.3549722
Show More Cited By

Index Terms

A stochastic gradient method with an exponential convergence rate for finite training sets
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization
  2. Probability and statistics
    1. Stochastic processes
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization

Index terms have been assigned to the content through auto-classification.

Recommendations

On the Exponential Convergence Rate of Proximal Gradient Flow Algorithms
2018 IEEE Conference on Decision and Control (CDC)
Many modern large-scale and distributed optimization problems can be cast into a form in which the objective function is a sum of a smooth term and a nonsmooth regularizer. Such problems can be solved via a proximal gradient method which generalizes ...
A sufficient descent conjugate gradient method and its global convergence

In this paper, a DL-type conjugate gradient method is presented. The given method is a modification of the Dai–Liao conjugate gradient method. It can also be considered as a modified LS conjugate gradient method. For general objective functions, the ...
Self-scaled conjugate gradient training algorithms

This article presents some efficient training algorithms, based on conjugate gradient optimization methods. In addition to the existing conjugate gradient training algorithms, we introduce Perry's conjugate gradient method as a training algorithm [A. ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'12: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2

December 2012

3328 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

61
Total Citations
View Citations
6
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bai YLiu YLuo LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)On the complexity of finite-sum smooth optimization under the Polyak—Łojasiewicz conditionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692165(2392-2417)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692165
Nguyen LTran TOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)On the convergence to a global solution of shuffling-type gradient algorithmsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669424(75545-75566)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669424
Liu ZZhang XLiu JChong SJoo CLee K(2022)SYNTHESISProceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3492866.3549722(151-160)Online publication date: 3-Oct-2022
https://dl.acm.org/doi/10.1145/3492866.3549722
Cohen KNedic ASrikant R(2021)On projected stochastic gradient descent algorithm with weighted averaging for least squares regression2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2016.7472090(2314-2318)Online publication date: 11-Mar-2021
https://dl.acm.org/doi/10.1109/ICASSP.2016.7472090
Can BSoori SDehnavi MGürbüzbalaban M(2021)L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method2021 60th IEEE Conference on Decision and Control (CDC)10.1109/CDC45484.2021.9682985(2386-2393)Online publication date: 14-Dec-2021
https://dl.acm.org/doi/10.1109/CDC45484.2021.9682985
Zhu MLiu CZhu JDaumé HSingh A(2020)Variance reduction and Quasi-Newton for particle-based variational inferenceProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3526011(11576-11587)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3526011
Vladarean MAlacaoglu AHsieh YCevher VDaumé HSingh A(2020)Conditional gradient methods for stochastically constrained convex minimizationProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525844(9775-9785)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525844
Mirzasoleiman BBilmes JLeskovec JDaumé HSingh A(2020)Coresets for data-efficient training of machine learning modelsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525583(6950-6960)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525583
Huang FTao LChen SDaumé HSingh A(2020)Accelerated stochastic gradient-free and projection-free methodsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525358(4519-4530)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525358
Hanzely FKovalev DRichtárik PDaumé HSingh A(2020)Variance reduced coordinate descent with accelerationProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525316(4039-4048)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525316
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents