AdaGrad avoids saddle points

K Antonakopoulos, P Mertikopoulos… - International …, 2022 - proceedings.mlr.press
… The problem we address in this article is: Do ADAGRAD algorithms provably avoid saddle
points? This saddle avoidance type question stimulates the study of non-convex optimization, …

Local saddle point optimization: A curvature exploitation approach

L Adolphs, H Daneshmand, A Lucchi… - The 22nd …, 2019 - proceedings.mlr.press
avoids convergence to undesired saddles–albeit not guarantees convergence on a general
non-convex-concave saddle point … We compare Cesp to the vanilla Adagrad optimizer. …

AdaGrad under Anisotropic Smoothness

Y Liu, R Pan, T Zhang - arXiv preprint arXiv:2406.15244, 2024 - arxiv.org
… the real anisotropic curvature; (3) Technically, it is difficult to avoid explicit existence of d
in the final convergence bound of AdaGrad if Eqn. (10) is employed, for instance, even the …

A simple convergence proof of adam and adagrad

A Défossez, L Bottou, F Bach, N Usunier - arXiv preprint arXiv:2003.02395, 2020 - arxiv.org
… In this paper, we present a simplified and unified proof of convergence to a critical point
for Adagrad and Adam for stochastic non-convex smooth optimization. We assume that the …

On scaled methods for saddle point problems

A Beznosikov, A Alanov, D Kovalev, M Takáč… - arXiv preprint arXiv …, 2022 - arxiv.org
… Methods with adaptive scaling of different features play a key role in solving saddle point
is it avoided. At the moment this assumption can be found in the literature on saddle point

InterAdam: Interpolating Dull Intervention to Adaptive Gradient Method

C Sun, Y Gan, J Zhang, J Dong - 2021 6th International …, 2021 - ieeexplore.ieee.org
… of escaping the saddle points and avoiding overfitting, while … of the simple average
one in AdaGrad. In particular, Adam is … proof of the existence of critical points under non-convexity …

[PDF][PDF] Non convex-concave saddle point optimization

L Adolphs - 2018 - research-collection.ethz.ch
… To experimentally show that the cross-dependency term can either improve or worsen the
convergence rate of Adagrad, we design specific problems based on the knowledge about the …

On the Convergence of AdaGrad-Norm for Non-Convex Optimization

R Jin, X Wang - openreview.net
… However, as a fundamental adaptive optimizer, the theoretical analysis of AdaGrad-Norm is
AdaGrad-Norm. We propose novel techniques that avoid the assumption of no saddle points

Two sides of one coin: the limits of untuned SGD and the power of adaptive methods

J Yang, X Li, I Fatkhullin, N He - Advances in Neural …, 2024 - proceedings.neurips.cc
… escape from saddle points by adaptive methods [41… SGD, AMSGrad-norm, and AdaGrad-norm
– to establish a universal observation that they avoid exponential dependency in ℓ without …

Simba: A Scalable Bilevel Preconditioned Gradient Method for Fast Evasion of Flat Areas and Saddle Points

N Tsipinakis, P Parpas - arXiv preprint arXiv:2309.05309, 2023 - arxiv.org
… to alleviate the limitations of AdaGrad. RMSprop scales the gradient … Yogi, has similar updates
as AdaGrad but allows for the … First-order methods almost always avoid saddle points: The …