Article

Free access

Rigorous learning curve bounds from statistical mechanics

Authors:

David Haussler,

H. Sebastian Seung,

Michael Kearns,

Naftali TishbyAuthors Info & Claims

COLT '94: Proceedings of the seventh annual conference on Computational learning theory

Pages 76 - 87

https://doi.org/10.1145/180139.181018

Published: 16 July 1994 Publication History

Abstract

In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory.

References

[1]

S. Amari, N. Fujita, and S. Shinomoto. Four types of learning curves. Neural Computation, 4(4):605-618, 1992.]]

Digital Library

[2]

E.B. Baum and Y.-D. Lyuu. The transition to perfect generalization in perceptrons. Neural Comput., 3:386-401, 1991.]]

[3]

G. Benedek and A. Itai. Learnability with respect to fixed distributions. Theoret. Comput. Sc~., 86(2):377-389, 1991.]]

Digital Library

[4]

D. Cohn and G. Tesauro. How tight are the Vapnik- Chervonenkis bounds. Neural Comput., 4:249-269, 1992.]]

Digital Library

[5]

L. Devroye and G. Lugosi. Lower bounds in pattern recognition and learning. 1994. Preprint.]]

[6]

R. M. Dudley. Centrallimit theorems for emplricalmeasures. Annals of Probability, 6(6):899-929, 1978.]]

[7]

A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computatwn, 82(3):247-251~ 1989.]]

Digital Library

[8]

A. Engel and W. Fink. Statistical mechanics calculaOon of Vapnik Chervonenkis bounds for perceptrons. J. Phys., 26:6893-6914, 1993.]]

[9]

A. Engel and C. van den Broeck. Systems that can learn from examples: replica calculation of uniform convergence bounds for the perceptron. Phys. Rev. Lett., 71:1772-1775, 1993.]]

[10]

E. Gardner. The space of interactions in neural network models. J. Phys., A21:257-270, 1988.]]

[11]

E. Gardner and B. Derrida. Three unfinished works on the optimal storage capacity of networks. J. Phys., A22:1983- 1994, 1989.]]

[12]

S. A. Goldman, M. J. Kearns, and R. E. Schapire. On the sample complexity of weak learning. In Proceedings of the 3rd Workshop on Computational Learning Theory, pages 217-231. Morgan Kaufmann, San Mateo, CA, 1990.]]

Digital Library

[13]

G. GySrgyi. First-order transition to perfect generalization in a neural network with binary synapses. Phys. Ray., A41:7097-7100, 1990.]]

[14]

D. Haussler. Decision-theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78-150~ 1992.]]

Digital Library

[15]

D. Haussler, M. Kearns, and R. E. Schapire. Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. In Proceedings of the gth Workshop on Computational Learning Theory, pages 61-74. Morgan Kaufmann, San Mateo, CA, 1991.]]

Digital Library

[16]

E. Levin, N. Tishby, and S. Solla. A statistical approach to learning and generalization in neural networks. In R. Rivest, editor, Proc. 3rd Annu. Workshop on Comput. Learning Theory. Morgan Kanfmann, 1989.]]

Digital Library

[17]

Y.-D. Lyuu and I. Rivin. Tight bounds on transition to perfect generalization in perceptrons. Neural Comput., 4:854- 862, 1992.]]

Digital Library

[18]

G. L. Martin and J. A. Pittman. Recognizing hand-printed letters and digits using backpropagation learning. Neural Comput., 3:258-267, 1991.]]

[19]

D. Pollard. Convergence of Stochastic Processes. Springer- Verlag, 1984.]]

[20]

D. B. Schwartz, V. K. Samalam, J. S. Denker, and S. A. Solla. Exhaustive learning. Neural Comput., 2:374-385, 1990.]]

Digital Library

[21]

H. S. Setmg, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review, A45:6056-6091, 1992.]]

[22]

H. U. Simon. General bounds on the number of examples needed for learning probabilistic concepts. In Proceedings of the 6th Annual ACM Conference on Computatwnal Learning Theory, pages 402-411. ACM Press, New York, NY, 1993.]]

Digital Library

[23]

H. Sompolinsky, H. S. Seung, and N. Tishby. Learning curves in large neural networks. In Proc.ith Annu. Workshop on Comput. Learning Theory, pages 112-127. Morgan Kaufmann, San Mateo, CA, 1991.]]

Digital Library

[24]

H. Sompolinsky, N. Tishby, and H. S. Seung. Learning from examples in large neural networks. Phys. Rev. Lett., 65(13):1683-1686, 1990.]]

[25]

V. Vapnik, E. Levin, and Y. LeCun. Measuring the VC dimension of a learning machine. Neural Comput., 1994. To appear.]]

Digital Library

[26]

V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York, 1982.]]

Digital Library

[27]

V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applicatwns, 16(2):264- 280, 1971.]]

[28]

T. L. H. Watkin, A. Rau, and M. Biehl. The statistical mechanics of learning a rule. Rev. Mod. Phys., 65:499-556, 1993.]]

Cited By

Zhou YPang TLiu KMartin CMahoney MYang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Temperature balancing, layer-wise weight analysis, and neural network trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668897(63542-63572)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668897
Chen DZhu YZhang JDu YLi ZLiu QWu SWang LOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Uncovering neural scaling laws in molecular representation learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666194(1452-1475)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666194
(2015)BibliographyError Estimation for Pattern Recognition10.1002/9781119079507.biblio(291-300)Online publication date: 14-Jul-2015
https://doi.org/10.1002/9781119079507.biblio
Show More Cited By

Index Terms

Rigorous learning curve bounds from statistical mechanics

Recommendations

Rigorous learning curve bounds from statistical mechanics
Special issue on COLT '94
New Lower Bounds for Statistical Query Learning
COLT '02: Proceedings of the 15th Annual Conference on Computational Learning Theory

We prove two lower bounds on the Statistical Query (SQ) learning model. The first lower bound is on weak-learning. We prove that for a concept class of SQ-dimension d , a running time of ( d / log d ) is needed. The SQ-dimension of a concept class is ...
New lower bounds for statistical query learning
Special issue on COLT 2002

We prove two lower bounds in the statistical query (SQ) learning model. The first lower bound is on weak-learning. We prove that for a concept class of SQ-dimension d, a running time of @W(d/logd) is needed. The SQ-dimension of a concept class is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

COLT '94: Proceedings of the seventh annual conference on Computational learning theory

July 1994

376 pages

ISBN:0897916557

DOI:10.1145/180139

Chairman:
Manfred Warmuth
Univ. of California, Santa Cruz

Copyright © 1994 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 July 1994

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

7COLT94

Sponsor:

7COLT94: 7th Annual Conference on Computational Learning Theory

July 12 - 15, 1994

New Jersey, New Brunswick, USA

Acceptance Rates

Overall Acceptance Rate 35 of 71 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
149
Total Downloads

Downloads (Last 12 months)70
Downloads (Last 6 weeks)6

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhou YPang TLiu KMartin CMahoney MYang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Temperature balancing, layer-wise weight analysis, and neural network trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668897(63542-63572)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668897
Chen DZhu YZhang JDu YLi ZLiu QWu SWang LOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Uncovering neural scaling laws in molecular representation learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666194(1452-1475)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666194
(2015)BibliographyError Estimation for Pattern Recognition10.1002/9781119079507.biblio(291-300)Online publication date: 14-Jul-2015
https://doi.org/10.1002/9781119079507.biblio
Satyanarayana A(2014)Intelligent sampling for big data using bootstrap sampling and chebyshev inequality2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE)10.1109/CCECE.2014.6901029(1-6)Online publication date: May-2014
https://doi.org/10.1109/CCECE.2014.6901029
Liang PSrebro N(2010)On the interaction between norm and dimensionalityProceedings of the 27th International Conference on International Conference on Machine Learning10.5555/3104322.3104405(647-654)Online publication date: 21-Jun-2010
https://dl.acm.org/doi/10.5555/3104322.3104405
Mascherini M(2008)Learning the Structure of Bayesian Networks Representing Influence Relations among GenesProceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & Automation10.1109/CIMCA.2008.21(1023-1028)Online publication date: 10-Dec-2008
https://dl.acm.org/doi/10.1109/CIMCA.2008.21
Schuurmans D(2005)Characterizing rational versus exponential learning curvesComputational Learning Theory10.1007/3-540-59119-2_184(272-286)Online publication date: 1-Jun-2005
https://doi.org/10.1007/3-540-59119-2_184
Satyanarayana ADavidson I(2005)A dynamic adaptive sampling algorithm (DASA) for real world applicationsProceedings of the 15th international conference on Foundations of Intelligent Systems10.1007/11425274_65(631-640)Online publication date: 25-May-2005
https://dl.acm.org/doi/10.1007/11425274_65
Raudys ŠYoung D(2004)Results in statistical discriminant analysisJournal of Multivariate Analysis10.1016/S0047-259X(02)00021-089:1(1-35)Online publication date: 1-Apr-2004
https://dl.acm.org/doi/10.1016/S0047-259X%2802%2900021-0
Baum EBoneh DGarrett C(2001)Where Genetic Algorithms ExcelEvolutionary Computation10.1162/106365601510751309:1(93-124)Online publication date: 1-Jan-2001
https://dl.acm.org/doi/10.1162/10636560151075130
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents