skip to main content
research-article

Finite-sum smooth optimization with SARAH

Published: 01 July 2022 Publication History

Abstract

We introduce NC-SARAH for non-convex optimization as a practical modified version of the original SARAH algorithm that was developed for convex optimization. NC-SARAH is the first to achieve two crucial performance properties at the same time—allowing flexible minibatch sizes and large step sizes to achieve fast convergence in practice as verified by experiments. NC-SARAH has a close to optimal asymptotic convergence rate equal to existing prior variants of SARAH called SPIDER and SpiderBoost that either use an order of magnitude smaller step size or a fixed minibatch size. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.

References

[1]
Allen-Zhu, Z.: Natasha: faster non-convex stochastic optimization via strongly non-convex parameter. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 89–97 (2017)
[2]
Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than sgd. In: Advances in Neural Information Processing Systems, pp. 2675–2686 (2018)
[3]
Allen-Zhu, Z., Yuan, Y.: Improved SVRG for non-strongly-convex or sum-of-non-convex objectives. In: ICML, pp. 1080–1089 (2016)
[4]
Bottou L, Curtis FE, and Nocedal J Optimization methods for large-scale machine learning SIAM Rev. 2018 60 2 223-311
[5]
Chang CC and Lin CJ LIBSVM: a library for support vector machines ACM Trans. Intell. Syst. Technol. 2011 2 271-2727
[6]
Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)
[7]
Duchi J, Hazan E, and Singer Y Adaptive subgradient methods for online learning and stochastic optimization J. Mach. Learn. Res. 2011 12 2121-2159
[8]
Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Advances in Neural Information Processing Systems, pp. 689–699 (2018)
[9]
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)
[10]
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR arXiv:1412.6980 (2014)
[11]
Konečnỳ J and Richtárik P Semi-stochastic gradient descent methods Front. Appl. Math. Stat. 2017 3 9
[12]
Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via SCSG methods. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 2348–2358. Curran Associates, Inc. (2017)
[13]
Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: a simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295. PMLR (2021)
[14]
Liu, Y., Feng, F., Yin, W.: Acceleration of svrg and katyusha x by inexact preconditioning. In: International Conference on Machine Learning, pp. 4003–4012. PMLR (2019)
[15]
Mairal, J.: Optimization with first-order surrogate functions. In: International Conference on Machine Learning, pp. 783–791 (2013)
[16]
Nesterov Y Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization 2004 Boston Kluwer Academic Publ
[17]
Nguyen, L., Nguyen, P.H., van Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! Convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 3747–3755 (2018)
[18]
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2613–2621 (2017)
[19]
Nguyen, L.M., Liu, J., Scheinberg, K., Takác, M.: Stochastic recursive gradient algorithm for nonconvex optimization. CoRR arXiv:1705.07261 (2017)
[20]
Nguyen LM, Nguyen PH, Richtárik P, Scheinberg K, Takáč M, and van Dijk M New convergence aspects of stochastic gradient algorithms J. Mach. Learn. Res. 2019 20 176 1-49
[21]
Nguyen LM, Scheinberg K, and Takac M Inexact sarah algorithm for stochastic optimization Optim. Methods Softw. 2020
[22]
Pham NH, Nguyen LM, Phan DT, and Tran-Dinh Q Proxsarah: an efficient algorithmic framework for stochastic composite nonconvex optimization J. Mach. Learn. Res. 2020 21 110-1
[23]
Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016)
[24]
Robbins H and Monro S A stochastic approximation method Ann. Math. Stat. 1951 22 3 400-407
[25]
Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence _rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)
[26]
Schmidt M, Le Roux N, and Bach F Minimizing finite sums with the stochastic average gradient Math. Program. 2017 162 1 83-112
[27]
Shalev-Shwartz S and Zhang T Stochastic dual coordinate ascent methods for regularized loss J. Mach. Learn. Res. 2013 14 1 567-599
[28]
Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost: a class of faster variance-reduced algorithms for nonconvex optimization. In: Advances in Neural Information Processing Systems (2019)
[29]
Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduced gradient descent for nonconvex optimization. In: Advances in Neural Information Processing Systems, pp. 3921–3932 (2018)

Cited By

View all
  • (2024)SILVERProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693639(38683-38739)Online publication date: 21-Jul-2024
  • (2023)Improving the privacy and practicality of objective perturbation for differentially private linear learnersProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666731(13819-13853)Online publication date: 10-Dec-2023
  • (2022)Escaping saddle points with bias-variance reduced local perturbed SGD for communication efficient nonconvex distributed learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600634(5039-5051)Online publication date: 28-Nov-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computational Optimization and Applications
Computational Optimization and Applications  Volume 82, Issue 3
Jul 2022
228 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2022
Accepted: 16 April 2022
Received: 19 August 2020

Author Tags

  1. Finite-sum
  2. Smooth
  3. Non-convex
  4. Convex
  5. Stochastic algorithm
  6. Variance reduction

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SILVERProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693639(38683-38739)Online publication date: 21-Jul-2024
  • (2023)Improving the privacy and practicality of objective perturbation for differentially private linear learnersProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666731(13819-13853)Online publication date: 10-Dec-2023
  • (2022)Escaping saddle points with bias-variance reduced local perturbed SGD for communication efficient nonconvex distributed learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600634(5039-5051)Online publication date: 28-Nov-2022

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media