skip to main content
10.1145/180139.181018acmconferencesArticle/Chapter ViewAbstractPublication PagescoltConference Proceedingsconference-collections
Article
Free access

Rigorous learning curve bounds from statistical mechanics

Published: 16 July 1994 Publication History

Abstract

In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory.

References

[1]
S. Amari, N. Fujita, and S. Shinomoto. Four types of learning curves. Neural Computation, 4(4):605-618, 1992.]]
[2]
E.B. Baum and Y.-D. Lyuu. The transition to perfect generalization in perceptrons. Neural Comput., 3:386-401, 1991.]]
[3]
G. Benedek and A. Itai. Learnability with respect to fixed distributions. Theoret. Comput. Sc~., 86(2):377-389, 1991.]]
[4]
D. Cohn and G. Tesauro. How tight are the Vapnik- Chervonenkis bounds. Neural Comput., 4:249-269, 1992.]]
[5]
L. Devroye and G. Lugosi. Lower bounds in pattern recognition and learning. 1994. Preprint.]]
[6]
R. M. Dudley. Centrallimit theorems for emplricalmeasures. Annals of Probability, 6(6):899-929, 1978.]]
[7]
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computatwn, 82(3):247-251~ 1989.]]
[8]
A. Engel and W. Fink. Statistical mechanics calculaOon of Vapnik Chervonenkis bounds for perceptrons. J. Phys., 26:6893-6914, 1993.]]
[9]
A. Engel and C. van den Broeck. Systems that can learn from examples: replica calculation of uniform convergence bounds for the perceptron. Phys. Rev. Lett., 71:1772-1775, 1993.]]
[10]
E. Gardner. The space of interactions in neural network models. J. Phys., A21:257-270, 1988.]]
[11]
E. Gardner and B. Derrida. Three unfinished works on the optimal storage capacity of networks. J. Phys., A22:1983- 1994, 1989.]]
[12]
S. A. Goldman, M. J. Kearns, and R. E. Schapire. On the sample complexity of weak learning. In Proceedings of the 3rd Workshop on Computational Learning Theory, pages 217-231. Morgan Kaufmann, San Mateo, CA, 1990.]]
[13]
G. GySrgyi. First-order transition to perfect generalization in a neural network with binary synapses. Phys. Ray., A41:7097-7100, 1990.]]
[14]
D. Haussler. Decision-theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78-150~ 1992.]]
[15]
D. Haussler, M. Kearns, and R. E. Schapire. Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. In Proceedings of the gth Workshop on Computational Learning Theory, pages 61-74. Morgan Kaufmann, San Mateo, CA, 1991.]]
[16]
E. Levin, N. Tishby, and S. Solla. A statistical approach to learning and generalization in neural networks. In R. Rivest, editor, Proc. 3rd Annu. Workshop on Comput. Learning Theory. Morgan Kanfmann, 1989.]]
[17]
Y.-D. Lyuu and I. Rivin. Tight bounds on transition to perfect generalization in perceptrons. Neural Comput., 4:854- 862, 1992.]]
[18]
G. L. Martin and J. A. Pittman. Recognizing hand-printed letters and digits using backpropagation learning. Neural Comput., 3:258-267, 1991.]]
[19]
D. Pollard. Convergence of Stochastic Processes. Springer- Verlag, 1984.]]
[20]
D. B. Schwartz, V. K. Samalam, J. S. Denker, and S. A. Solla. Exhaustive learning. Neural Comput., 2:374-385, 1990.]]
[21]
H. S. Setmg, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review, A45:6056-6091, 1992.]]
[22]
H. U. Simon. General bounds on the number of examples needed for learning probabilistic concepts. In Proceedings of the 6th Annual ACM Conference on Computatwnal Learning Theory, pages 402-411. ACM Press, New York, NY, 1993.]]
[23]
H. Sompolinsky, H. S. Seung, and N. Tishby. Learning curves in large neural networks. In Proc.ith Annu. Workshop on Comput. Learning Theory, pages 112-127. Morgan Kaufmann, San Mateo, CA, 1991.]]
[24]
H. Sompolinsky, N. Tishby, and H. S. Seung. Learning from examples in large neural networks. Phys. Rev. Lett., 65(13):1683-1686, 1990.]]
[25]
V. Vapnik, E. Levin, and Y. LeCun. Measuring the VC dimension of a learning machine. Neural Comput., 1994. To appear.]]
[26]
V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York, 1982.]]
[27]
V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applicatwns, 16(2):264- 280, 1971.]]
[28]
T. L. H. Watkin, A. Rau, and M. Biehl. The statistical mechanics of learning a rule. Rev. Mod. Phys., 65:499-556, 1993.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
COLT '94: Proceedings of the seventh annual conference on Computational learning theory
July 1994
376 pages
ISBN:0897916557
DOI:10.1145/180139
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 July 1994

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

7COLT94
Sponsor:
7COLT94: 7th Annual Conference on Computational Learning Theory
July 12 - 15, 1994
New Jersey, New Brunswick, USA

Acceptance Rates

Overall Acceptance Rate 35 of 71 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)6
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media