×
Nov 19, 2022 · It provides two opportunities for understanding better the generalization behaviour of SGD through its SDE approximation.
Jun 8, 2024 · Firstly, viewing SGD as full-batch gradient descent with Gaussian gradient noise allows us to obtain trajectory-based generalization bound using ...
Stochastic differential equations (SDEs) have been shown recently to well character- ize the dynamics of training machine learning models with SGD.
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States. Ziqiao Wang · Yongyi Mao.
This work views SGD as full-batch gradient descent with Gaussian gradient noise and estimates the steady-state weight distribution of SDE and uses ...
Jun 10, 2024 · The paper provides a theoretically grounded approach to understanding the generalization behavior of SGD through its SDE approximation. The ...
Stochastic differential equations (SDEs) have been shown recently to characterize well the dynamics of training machine learning models with SGD.
Two facets of sde under an information-theoretic lens: Generalization of sgd via training trajectories and via terminal states. Z Wang, Y Mao. The Fortieth ...
May 14, 2024 · Ziqiao Wang and Yongyi Mao, “Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via ...