research-article

Finite-sum smooth optimization with SARAH

Authors:

Marten van Dijk,

Phuong Ha Nguyen,

Jayant R. KalagnanamAuthors Info & Claims

Computational Optimization and Applications, Volume 82, Issue 3

Pages 561 - 593

https://doi.org/10.1007/s10589-022-00375-x

Published: 01 July 2022 Publication History

Abstract

We introduce NC-SARAH for non-convex optimization as a practical modified version of the original SARAH algorithm that was developed for convex optimization. NC-SARAH is the first to achieve two crucial performance properties at the same time—allowing flexible minibatch sizes and large step sizes to achieve fast convergence in practice as verified by experiments. NC-SARAH has a close to optimal asymptotic convergence rate equal to existing prior variants of SARAH called SPIDER and SpiderBoost that either use an order of magnitude smaller step size or a fixed minibatch size. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.

References

[1]

Allen-Zhu, Z.: Natasha: faster non-convex stochastic optimization via strongly non-convex parameter. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 89–97 (2017)

[2]

Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than sgd. In: Advances in Neural Information Processing Systems, pp. 2675–2686 (2018)

[3]

Allen-Zhu, Z., Yuan, Y.: Improved SVRG for non-strongly-convex or sum-of-non-convex objectives. In: ICML, pp. 1080–1089 (2016)

[4]

Bottou L, Curtis FE, and Nocedal J Optimization methods for large-scale machine learning SIAM Rev. 2018 60 2 223-311

[5]

Chang CC and Lin CJ LIBSVM: a library for support vector machines ACM Trans. Intell. Syst. Technol. 2011 2 271-2727

[6]

Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)

[7]

Duchi J, Hazan E, and Singer Y Adaptive subgradient methods for online learning and stochastic optimization J. Mach. Learn. Res. 2011 12 2121-2159

[8]

Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Advances in Neural Information Processing Systems, pp. 689–699 (2018)

[9]

Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)

[10]

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR arXiv:1412.6980 (2014)

[11]

Konečnỳ J and Richtárik P Semi-stochastic gradient descent methods Front. Appl. Math. Stat. 2017 3 9

[12]

Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via SCSG methods. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 2348–2358. Curran Associates, Inc. (2017)

[13]

Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: a simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295. PMLR (2021)

[14]

Liu, Y., Feng, F., Yin, W.: Acceleration of svrg and katyusha x by inexact preconditioning. In: International Conference on Machine Learning, pp. 4003–4012. PMLR (2019)

[15]

Mairal, J.: Optimization with first-order surrogate functions. In: International Conference on Machine Learning, pp. 783–791 (2013)

[16]

Nesterov Y Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization 2004 Boston Kluwer Academic Publ

[17]

Nguyen, L., Nguyen, P.H., van Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! Convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 3747–3755 (2018)

[18]

Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2613–2621 (2017)

[19]

Nguyen, L.M., Liu, J., Scheinberg, K., Takác, M.: Stochastic recursive gradient algorithm for nonconvex optimization. CoRR arXiv:1705.07261 (2017)

[20]

Nguyen LM, Nguyen PH, Richtárik P, Scheinberg K, Takáč M, and van Dijk M New convergence aspects of stochastic gradient algorithms J. Mach. Learn. Res. 2019 20 176 1-49

[21]

Nguyen LM, Scheinberg K, and Takac M Inexact sarah algorithm for stochastic optimization Optim. Methods Softw. 2020

[22]

Pham NH, Nguyen LM, Phan DT, and Tran-Dinh Q Proxsarah: an efficient algorithmic framework for stochastic composite nonconvex optimization J. Mach. Learn. Res. 2020 21 110-1

[23]

Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016)

[24]

Robbins H and Monro S A stochastic approximation method Ann. Math. Stat. 1951 22 3 400-407

[25]

Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence _rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)

[26]

Schmidt M, Le Roux N, and Bach F Minimizing finite sums with the stochastic average gradient Math. Program. 2017 162 1 83-112

[27]

Shalev-Shwartz S and Zhang T Stochastic dual coordinate ascent methods for regularized loss J. Mach. Learn. Res. 2013 14 1 567-599

[28]

Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost: a class of faster variance-reduced algorithms for nonconvex optimization. In: Advances in Neural Information Processing Systems (2019)

[29]

Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduced gradient descent for nonconvex optimization. In: Advances in Neural Information Processing Systems, pp. 3921–3932 (2018)

Cited By

Oko KAkiyama SWu DMurata TSuzuki TSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)SILVERProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693639(38683-38739)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693639
Redberg RKoskela AWang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Improving the privacy and practicality of objective perturbation for differentially private linear learnersProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666731(13819-13853)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666731
Murata TSuzuki TKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Escaping saddle points with bias-variance reduced local perturbed SGD for communication efficient nonconvex distributed learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600634(5039-5051)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600634

Index Terms

Finite-sum smooth optimization with SARAH

Index terms have been assigned to the content through auto-classification.

Recommendations

Newton Differentiability of Convex Functions in Normed Spaces and of a Class of Operators

Newton differentiability is an important concept for analyzing generalized Newton methods for nonsmooth equations. In this work, for a convex function defined on an infinite-dimensional space, we discuss the relation between Newton and Bouligand ...
Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization
NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known ...
Strong Metric (Sub)regularity of Karush–Kuhn–Tucker Mappings for Piecewise Linear-Quadratic Convex-Composite Optimization and the Quadratic Convergence of Newton’s Method
This work concerns the local convergence theory of Newton and quasi-Newton methods for convex-composite optimization: where one minimizes an objective that can be written as the composition of a convex function with one that is continuiously ...

Comments

Information & Contributors

Information

Published In

cover image Computational Optimization and Applications

Computational Optimization and Applications Volume 82, Issue 3

Jul 2022

228 pages

ISSN:0926-6003

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2022

Accepted: 16 April 2022

Received: 19 August 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Oko KAkiyama SWu DMurata TSuzuki TSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)SILVERProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693639(38683-38739)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693639
Redberg RKoskela AWang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Improving the privacy and practicality of objective perturbation for differentially private linear learnersProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666731(13819-13853)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666731
Murata TSuzuki TKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Escaping saddle points with bias-variance reduced local perturbed SGD for communication efficient nonconvex distributed learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600634(5039-5051)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600634

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents