poster

Constrained stochastic gradient descent for large-scale least squares problem

Authors:

Dacheng TaoAuthors Info & Claims

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 883 - 891

https://doi.org/10.1145/2487575.2487635

Published: 11 August 2013 Publication History

Abstract

The least squares problem is one of the most important regression problems in statistics, machine learning and data mining. In this paper, we present the Constrained Stochastic Gradient Descent (CSGD) algorithm to solve the large-scale least squares problem. CSGD improves the Stochastic Gradient Descent (SGD) by imposing a provable constraint that the linear regression line passes through the mean point of all the data points. It results in the best regret bound $O(\log{T})$, and fastest convergence speed among all first order approaches. Empirical studies justify the effectiveness of CSGD by comparing it with SGD and other state-of-the-art approaches. An example is also given to show how to use CSGD to optimize SGD based least squares problems to achieve a better performance.

References

[1]

A. Agarwal, S. Negahban, and M. Wainwright. Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions. Advances in Neural Information Processing Systems, pages 1547--1555, 2012.

[2]

F. Bach and E. Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in Neural Information Processing Systems, pages 451--459, 2011.

[3]

P. L. Bartlett, E. Hazan, and A. Rakhlin. Adaptive online gradient descent. Advances in Neural Information Processing Systems, 2007.

[4]

L. E. Baum and M. Katz. Exponential convergence rates for the law of large numbers. Transaction American Mathematical Society, pages 771--772, 1963.

[5]

D. P. Bertsekas. Nonlinear programming. Athena Scientific, 1999.

[6]

C. M. Bishop. Pattern recognition and machine learning. Springer-Verlag New York, 2006.

Digital Library

[7]

L. Bottou and O. Bousquet. The tradeoffs of large scale learning. Advances in Neural Information Processing Systems, 20:161--168, 2008.

[8]

L. Bottou and Y. LeCun. Large scale online learning. Advances in Neural Information Processing Systems, 2003.

[9]

J. C. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121--2159, 2011.

Digital Library

[10]

E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Conference on Learning Theory, 69(2--3):169--192, Dec. 2007.

Digital Library

[11]

C. Hu, J. T. Kwok, and W. Pan. Accelerated gradient methods for stochastic optimization and online learning. Advances in Neural Information Processing Systems, pages 781--789, 2009.

Digital Library

[12]

H. J. Kushner and G. Yin. Stochastic approximation and recursive algorithms and applications. Springer-Verlag, 2003.

[13]

J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. Journal of Machine Learning Research, 10:777--801, June 2009.

Digital Library

[14]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.

[15]

L. Ljung. Analysis of stochastic gradient algorithms for linear regression problems. IEEE Transactions on Information Theory, pages 30(2):151--160, 1984.

Digital Library

[16]

K. Mehrotra, C. K. Mohan, and S. Ranka. Elements of artificial neural networks. MIT press, 1996.

[17]

A. Nemirovski. Efficient methods in convex programming. Lecture Notes, 1994.

[18]

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574--1609, Jan. 2009.

Digital Library

[19]

A. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. John Wiley and Sons Ltd, 1983.

[20]

Y. Nesterov. Introductory lectures on convex optimization. A basic course(Applied Optimization), 2004.

[21]

B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, pages 838--855, 1992.

Digital Library

[22]

R. Rifkin, G. Yeo, and T. Poggio. Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences, 190:131--154, 2003.

[23]

S. Shalev-Shwartz and S. M. Kakade. Mind the duality gap: Logarithmic regret algorithms for online optimization. Advances in Neural Information Processing Systems, pages 1457--1464, 2008.

[24]

S. Shalev-Shwartz and N. Srebro. Svm optimization: inverse dependence on training set size. International Conference on Machine Learning, 2008.

Digital Library

[25]

J. C. Spall. Introduction to stochastic search and optimization. John Wiley and Sons, Inc, 2003.

[26]

J. Suykens and J. Vandewalle. Least squares support vector machine classifiers. Neural processing letters, 1999.

Digital Library

[27]

L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579--2605):85, 2008.

[28]

L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research, 11:2543--2596, 2010.

Digital Library

[29]

T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. International Conference on Machine Learning, 2004.

Digital Library

[30]

M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. International Conference on Machine Learning, 2003.

Digital Library

Cited By

Zhu XDing BMeng QGu LYang Y(2018)Statistic Experience Based Adaptive One-Shot Detector (EAO) for Camera Sensing SystemSensors10.3390/s1809304118:9(3041)Online publication date: 11-Sep-2018
https://doi.org/10.3390/s18093041
Mu YLiu WLiu XFan W(2017)Stochastic Gradient Made StableIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.260430229:2(458-471)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1109/TKDE.2016.2604302
Liu LTao D(2016)Sublinear Dual Coordinate Ascent for Regularized Loss Minimization2016 IEEE 16th International Conference on Data Mining (ICDM)10.1109/ICDM.2016.0137(1065-1070)Online publication date: Dec-2016
https://doi.org/10.1109/ICDM.2016.0137
Show More Cited By

Index Terms

Constrained stochastic gradient descent for large-scale least squares problem

Recommendations

Solving large scale linear prediction problems using stochastic gradient descent algorithms
ICML '04: Proceedings of the twenty-first international conference on Machine learning

Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for classification, have been extensively used in statistics and machine learning. In this paper, we study stochastic gradient descent (SGD) ...
Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification

This work characterizes the benefits of averaging techniques widely used in conjunction with stochastic gradient descent (SGD). In particular, this work presents a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic ...
Large-scale machine learning with fast and stable stochastic conjugate gradient
Abstract
In deterministic optimization, conjugate gradient (CG) type approaches are preferred with a superior convergence rate than the ordinary gradient approaches. The requirement of solving large-scale data, growing exponentially, makes ...
Highlights
- A class of fast and robust stochastic conjugate gradient algorithm is proposed.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2013

1534 pages

ISBN:9781450321747

DOI:10.1145/2487575

Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

KDD' 13

Sponsor:

KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 11 - 14, 2013

Illinois, Chicago, USA

Acceptance Rates

KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
457
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu XDing BMeng QGu LYang Y(2018)Statistic Experience Based Adaptive One-Shot Detector (EAO) for Camera Sensing SystemSensors10.3390/s1809304118:9(3041)Online publication date: 11-Sep-2018
https://doi.org/10.3390/s18093041
Mu YLiu WLiu XFan W(2017)Stochastic Gradient Made StableIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.260430229:2(458-471)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1109/TKDE.2016.2604302
Liu LTao D(2016)Sublinear Dual Coordinate Ascent for Regularized Loss Minimization2016 IEEE 16th International Conference on Data Mining (ICDM)10.1109/ICDM.2016.0137(1065-1070)Online publication date: Dec-2016
https://doi.org/10.1109/ICDM.2016.0137
Vijaya Saradhi VCharly Abraham P(2016)Incremental maximum margin clusteringPattern Analysis & Applications10.1007/s10044-015-0447-519:4(1057-1067)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1007/s10044-015-0447-5
Ding SZhang JJia HQian J(2015)An Adaptive Density Data Stream Clustering AlgorithmCognitive Computation10.1007/s12559-015-9342-z8:1(30-38)Online publication date: 3-Jul-2015
https://doi.org/10.1007/s12559-015-9342-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents