skip to main content
10.1145/2487575.2487635acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Constrained stochastic gradient descent for large-scale least squares problem

Published: 11 August 2013 Publication History

Abstract

The least squares problem is one of the most important regression problems in statistics, machine learning and data mining. In this paper, we present the Constrained Stochastic Gradient Descent (CSGD) algorithm to solve the large-scale least squares problem. CSGD improves the Stochastic Gradient Descent (SGD) by imposing a provable constraint that the linear regression line passes through the mean point of all the data points. It results in the best regret bound $O(\log{T})$, and fastest convergence speed among all first order approaches. Empirical studies justify the effectiveness of CSGD by comparing it with SGD and other state-of-the-art approaches. An example is also given to show how to use CSGD to optimize SGD based least squares problems to achieve a better performance.

References

[1]
A. Agarwal, S. Negahban, and M. Wainwright. Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions. Advances in Neural Information Processing Systems, pages 1547--1555, 2012.
[2]
F. Bach and E. Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in Neural Information Processing Systems, pages 451--459, 2011.
[3]
P. L. Bartlett, E. Hazan, and A. Rakhlin. Adaptive online gradient descent. Advances in Neural Information Processing Systems, 2007.
[4]
L. E. Baum and M. Katz. Exponential convergence rates for the law of large numbers. Transaction American Mathematical Society, pages 771--772, 1963.
[5]
D. P. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
[6]
C. M. Bishop. Pattern recognition and machine learning. Springer-Verlag New York, 2006.
[7]
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. Advances in Neural Information Processing Systems, 20:161--168, 2008.
[8]
L. Bottou and Y. LeCun. Large scale online learning. Advances in Neural Information Processing Systems, 2003.
[9]
J. C. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121--2159, 2011.
[10]
E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Conference on Learning Theory, 69(2--3):169--192, Dec. 2007.
[11]
C. Hu, J. T. Kwok, and W. Pan. Accelerated gradient methods for stochastic optimization and online learning. Advances in Neural Information Processing Systems, pages 781--789, 2009.
[12]
H. J. Kushner and G. Yin. Stochastic approximation and recursive algorithms and applications. Springer-Verlag, 2003.
[13]
J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. Journal of Machine Learning Research, 10:777--801, June 2009.
[14]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.
[15]
L. Ljung. Analysis of stochastic gradient algorithms for linear regression problems. IEEE Transactions on Information Theory, pages 30(2):151--160, 1984.
[16]
K. Mehrotra, C. K. Mohan, and S. Ranka. Elements of artificial neural networks. MIT press, 1996.
[17]
A. Nemirovski. Efficient methods in convex programming. Lecture Notes, 1994.
[18]
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574--1609, Jan. 2009.
[19]
A. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. John Wiley and Sons Ltd, 1983.
[20]
Y. Nesterov. Introductory lectures on convex optimization. A basic course(Applied Optimization), 2004.
[21]
B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, pages 838--855, 1992.
[22]
R. Rifkin, G. Yeo, and T. Poggio. Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences, 190:131--154, 2003.
[23]
S. Shalev-Shwartz and S. M. Kakade. Mind the duality gap: Logarithmic regret algorithms for online optimization. Advances in Neural Information Processing Systems, pages 1457--1464, 2008.
[24]
S. Shalev-Shwartz and N. Srebro. Svm optimization: inverse dependence on training set size. International Conference on Machine Learning, 2008.
[25]
J. C. Spall. Introduction to stochastic search and optimization. John Wiley and Sons, Inc, 2003.
[26]
J. Suykens and J. Vandewalle. Least squares support vector machine classifiers. Neural processing letters, 1999.
[27]
L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579--2605):85, 2008.
[28]
L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research, 11:2543--2596, 2010.
[29]
T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. International Conference on Machine Learning, 2004.
[30]
M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. International Conference on Machine Learning, 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. large-scale least squares
  2. online learning
  3. stochastic optimization

Qualifiers

  • Poster

Conference

KDD' 13
Sponsor:

Acceptance Rates

KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media