skip to main content
10.5555/2999611.2999654guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Accelerated mini-batch stochastic dual coordinate ascent

Published: 05 December 2013 Publication History

Abstract

Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the mini-batch setting that is often used in practice. Our main contribution is to introduce an accelerated mini-batch version of SDCA and prove a fast convergence rate for this method. We discuss an implementation of our method over a parallel computing system, and compare the results to both the vanilla stochastic dual coordinate ascent and to the accelerated deterministic gradient descent method of Nesterov [2007].

References

[1]
Alekh Agarwal and John C Duchi. Distributed delayed stochastic optimization. In Decision and Control (CDC), 2012 IEEE 51st Annual Conference on, pages 5451-5452. IEEE, 2012.
[2]
Alekh Agarwal, Olivier Chapelle, Miroslav Dudík, and John Langford. A reliable effective terascale linear learning system. arXiv preprint arXiv:1110.4198, 2011.
[3]
Maria-Florina Balcan, Avrim Blum, Shai Fine, and Yishay Mansour. Distributed learning, communication complexity and privacy. arXiv preprint arXiv:1204.3514, 2012.
[4]
Joseph K Bradley, Aapo Kyrola, Danny Bickson, and Carlos Guestrin. Parallel coordinate descent for l1-regularized loss minimization. In ICML, 2011.
[5]
Andrew Cotter, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Better mini-batch algorithms via accelerated gradient methods. arXiv preprint arXiv:1106.4574, 2011.
[6]
Hal Daume III, Jeff M Phillips, Avishek Saha, and Suresh Venkatasubramanian. Protocols for learning classifiers on distributed data. arXiv preprint arXiv:1202.6078, 2012.
[7]
Ofer Dekel. Distribution-calibrated hierarchical classification. In NIPS, 2010.
[8]
Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, and Lin Xiao. Optimal distributed online prediction using mini-batches. The Journal of Machine Learning Research, 13:165-202, 2012.
[9]
John Duchi, Alekh Agarwal, and Martin J Wainwright. Distributed dual averaging in networks. Advances in Neural Information Processing Systems, 23, 2010.
[10]
Olivier Fercoq and Peter Richtárik. Smooth minimization of nonsmooth functions with parallel coordinate descent methods. arXiv preprint arXiv:1309.5885, 2013.
[11]
Nicolas Le Roux, Mark Schmidt, and Francis Bach. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets. arXiv preprint arXiv:1202.6258, 2012.
[12]
Phil Long and Rocco Servedio. Algorithms and hardness results for parallel large margin learning. In NIPS, 2011.
[13]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M Hellerstein. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv.W06.4990, 2010.
[14]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716-727, 2012.
[15]
Yurii Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103 (1):127-152, 2005.
[16]
Yurii Nesterov. Gradient methods for minimizing composite objective function, 2007.
[17]
Feng Niu, Benjamin Recht, Christopher Ré, and Stephen J Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. arXiv preprint arXiv:1106.5730, 2011.
[18]
Peter Richtárik and Martin Takáč. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, pages 1-38, 2012a.
[19]
Peter Richtárik and Martin Takáč. Parallel coordinate descent methods for big data optimization. arXiv preprint arXiv:1212.0873, 2012b.
[20]
Peter Richtárik and Martin Takáč. Distributed coordinate descent method for learning with big data. arXiv preprint arXiv:1310.2059, 2013.
[21]
Shai Shalev-Shwartz and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 14:567-599, Feb 2013a.
[22]
Shai Shalev-Shwartz and Tong Zhang. Accelerated mini-batch stochastic dual coordinate ascent. arxiv, 2013b.
[23]
Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In ICML, pages 807-814, 2007.
[24]
Martin Takác, Avleen Bijral, Peter Richtárik, and Nathan Srebro. Mini-batch primal and dual methods for svms. arxiv, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'13: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1
December 2013
3236 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 05 December 2013

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media