|
ABSTRACT
In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Eric B. Baum , Yuh-Dauh Lyuu, The transition to perfect generalization in perceptrons, Neural Computation, v.3 n.3, p.386-401, Fall 1991
|
| |
3
|
|
| |
4
|
|
| |
5
|
L. Devroye and G. Lugosi. Lower bounds in pattern recognition and learning. 1994. Preprint.
|
| |
6
|
R. M. Dudley. Centrallimit theorems for emplricalmeasures. Annals of Probability, 6(6):899-929, 1978.
|
| |
7
|
|
| |
8
|
A. Engel and W. Fink. Statistical mechanics calculaOon of Vapnik Chervonenkis bounds for perceptrons. J. Phys., 26:6893-6914, 1993.
|
| |
9
|
A. Engel and C. van den Broeck. Systems that can learn from examples: replica calculation of uniform convergence bounds for the perceptron. Phys. Rev. Lett., 71:1772-1775, 1993.
|
| |
10
|
E. Gardner. The space of interactions in neural network models. J. Phys., A21:257-270, 1988.
|
| |
11
|
E. Gardner and B. Derrida. Three unfinished works on the optimal storage capacity of networks. J. Phys., A22:1983- 1994, 1989.
|
| |
12
|
Sally A. Goldman , Michael J. Kearns , Robert E. Schapire, On the sample complexity of weak learning, Proceedings of the third annual workshop on Computational learning theory, p.217-231, August 06-08, 1990, Rochester, New York, United States
|
| |
13
|
G. GySrgyi. First-order transition to perfect generalization in a neural network with binary synapses. Phys. Ray., A41:7097-7100, 1990.
|
| |
14
|
|
| |
15
|
David Haussler , Michael Kearns , Robert Schapire, Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension, Proceedings of the fourth annual workshop on Computational learning theory, p.61-74, August 05-07, 1991, Santa Cruz, California, United States
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
D. Pollard. Convergence of Stochastic Processes. Springer- Verlag, 1984.
|
| |
20
|
D. B. Schwartz, V. K. Samalam, J. S. Denker, and S. A. Solla. Exhaustive learning. Neural Comput., 2:374-385, 1990.
|
| |
21
|
H. S. Setmg, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review, A45:6056-6091, 1992.
|
 |
22
|
|
| |
23
|
H. S. Seung , H. Sompolinsky , N. Tishby, Learning curves in large neural networks, Proceedings of the fourth annual workshop on Computational learning theory, p.112-127, August 05-07, 1991, Santa Cruz, California, United States
|
| |
24
|
H. Sompolinsky, N. Tishby, and H. S. Seung. Learning from examples in large neural networks. Phys. Rev. Lett., 65(13):1683-1686, 1990.
|
| |
25
|
|
| |
26
|
|
| |
27
|
V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applicatwns, 16(2):264- 280, 1971.
|
| |
28
|
T. L. H. Watkin, A. Rau, and M. Biehl. The statistical mechanics of learning a rule. Rev. Mod. Phys., 65:499-556, 1993.
|
CITED BY 18
|
|
Eric B. Baum , Dan Boneh , Charles Garrett, On genetic algorithms, Proceedings of the eighth annual conference on Computational learning theory, p.230-239, July 05-08, 1995, Santa Cruz, California, United States
|
|
|
|
|
|
Michael Kearns , Yishay Mansour , Andrew Y. Ng , Dana Ron, An experimental and theoretical comparison of model selection methods, Proceedings of the eighth annual conference on Computational learning theory, p.21-30, July 05-08, 1995, Santa Cruz, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
K. -R. Müller , M. Finke , N. Murata , K. Schulten , S. Amari, A numerical study on learning curves in stochastic multilayer feedforward networks, Neural Computation, v.8 n.5, p.1085-1106, July 1, 1996
|
|