Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

Xie, Cong; Koyejo, Oluwasanmi; Gupta, Indranil; Lin, Haibin

Computer Science > Machine Learning

arXiv:1911.09030 (cs)

[Submitted on 20 Nov 2019 (v1), last revised 5 Dec 2020 (this version, v2)]

Title:Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

Authors:Cong Xie, Oluwasanmi Koyejo, Indranil Gupta, Haibin Lin

View PDF

Abstract:When scaling distributed training, the communication overhead is often the bottleneck. In this paper, we propose a novel SGD variant with reduced communication and adaptive learning rates. We prove the convergence of the proposed algorithm for smooth but non-convex problems. Empirical results show that the proposed algorithm significantly reduces the communication overhead, which, in turn, reduces the training time by up to 30% for the 1B word dataset.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
Cite as:	arXiv:1911.09030 [cs.LG]
	(or arXiv:1911.09030v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1911.09030

Submission history

From: Cong Xie [view email]
[v1] Wed, 20 Nov 2019 16:58:40 UTC (364 KB)
[v2] Sat, 5 Dec 2020 00:26:57 UTC (885 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-11

Change to browse by:

cs
cs.DC
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Cong Xie
Oluwasanmi Koyejo
Indranil Gupta
Haibin Lin

export BibTeX citation

Computer Science > Machine Learning

Title:Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators