Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

Meng, Si Yi; Orvieto, Antonio; Cao, Daniel Yiming; De Sa, Christopher

Computer Science > Machine Learning

arXiv:2406.05033 (cs)

[Submitted on 7 Jun 2024 (v1), last revised 4 Nov 2024 (this version, v2)]

Title:Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

Authors:Si Yi Meng, Antonio Orvieto, Daniel Yiming Cao, Christopher De Sa

View PDF HTML (experimental)

Abstract:We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical step size $2/\lambda$, where $\lambda$ is the largest eigenvalue of the Hessian at the solution. Using a smaller-than-critical step size guarantees convergence if initialized nearby the solution: but does this suffice globally? In one dimension, we show that a step size less than $1/\lambda$ suffices for global convergence. However, for all step sizes between $1/\lambda$ and the critical step size $2/\lambda$, one can construct a dataset such that GD converges to a stable cycle. In higher dimensions, this is actually possible even for step sizes less than $1/\lambda$. Our results show that although local convergence is guaranteed for all step sizes less than the critical step size, global convergence is not, and GD may instead converge to a cycle depending on the initialization.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2406.05033 [cs.LG]
	(or arXiv:2406.05033v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.05033

Submission history

From: Si Yi Meng [view email]
[v1] Fri, 7 Jun 2024 15:53:06 UTC (9,544 KB)
[v2] Mon, 4 Nov 2024 15:23:23 UTC (10,815 KB)

Computer Science > Machine Learning

Title:Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators