ACM Home Page
Please provide us with feedback. Feedback
Fat-shattering and the learnability of real-valued functions
Full text PdfPdf (1.15 MB)
Source Annual Workshop on Computational Learning Theory archive
Proceedings of the seventh annual conference on Computational learning theory table of contents
New Brunswick, New Jersey, United States
Pages: 299 - 310  
Year of Publication: 1994
ISBN:0-89791-655-7
Authors
Peter L. Bartlett  Department of Systems Engineering, RSISE, Australian National University, Canberra, 0200 Australia
Philip M. Long  Computer Science Department, Duke University, P.O. Box 90129, Durham, NC
Robert C. Williamson  Department of Engineering, Australian National University, Canberra, 0200 Australia
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 33,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/180139.181158
What is a DOI?

ABSTRACT

We consider the problem of learning real-valued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a real-valued function class in terms of a generalization of the Vapnik-Chervonenkis dimension, the fat shattering function, introduced by Kearns and Schapire. We show that, given some restrictions on the noise, a function class is learnable in our model if and only if its fat-shattering function is finite. With different (also quite mild) restrictions, satisfied for example by gaussian noise, we show that a function class is learnable from polynomially many examples if and only if its fat-shattering function grows polynomially. We prove analogous results in an agnostic setting, where there is no assumption of an underlying function class.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
N. ALON, S. BEN-DAVID, N. CESA-BIANCHI AND D. HAUS- SLER, Scale-sensitive dimensions, uniform convergence, and learnability, Symposium on Foundations of Computer Science, 1993.
 
2
D. ANGLUIN AND L. G. VALIANT, Fast probabilistic algorithms for Hamiltonian circuits and matchings, Journal of Computer and System Sciences, 18 (1979), pp. 155-193.
 
3
4
5
6
7
 
8
9
 
10
S. VAN DE GEER, Regression Analysis and Empirical Processes, Centrnm voor Wiskunde en Informatica, Amsterdam, 1988.
 
11
 
12
M.J. KEARNS AND R. E. SCHAPIRE, Efficient distrbutionfree learning of probabilistic concepts (extended abstracO, Proceedings of the 31 st Annual Symposium on the Foundations of Computer Science, 1990.
13
 
14
N. MERHAVANDM. FEDER, Universal schemes for sequential decision from individual data sequences, IEEE Transaction on Information Theory, 39 (1993), pp. 1280-1292.
 
15
16
 
17
J.K. PATEL AND C. B. READ, The BigBook of Facts About the Normal Distribution, Marcel Dekker, New York, 1982.
 
18
D. POLLARD, Convergence of Stochastic Processes, Springer, New York, 1984.
 
19
H.L. ROYDEN, Real Analysis, Macmillan, New York, 1988.
 
20
 
21
H.U. S~ON, Bounds on the number of examples needed for learning functions, FB Informatik, LS II, Universitat Dortmund, Forschungsbericht Nr. 501, 1993.
22
 
23
V.N. VAPNIK AND A. Y. CHERVONENKIS, On the uniform convergence of relative frequencies of events to theirprobabilities, Theory of Probability and its Applications, XVI (1971), pp. 264-280.

CITED BY  12

Collaborative Colleagues:
Peter L. Bartlett: colleagues
Philip M. Long: colleagues
Robert C. Williamson: colleagues