ACM Home Page
Please provide us with feedback. Feedback
Probably almost discriminative learning
Full text PdfPdf (863 KB)
Source Annual Workshop on Computational Learning Theory archive
Proceedings of the fifth annual workshop on Computational learning theory table of contents
Pittsburgh, Pennsylvania, United States
Pages: 164 - 171  
Year of Publication: 1992
ISBN:0-89791-497-X
Author
Kenji Yamanishi  C&C Information Technology Research Laboratories., NEC Corporation, 1-1 Miyazaki 4-chome, Miyamae-ku, Kawasaki, Kanagawa, Japan
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 24,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/130385.130404
What is a DOI?

ABSTRACT

This paper develops a new computational model for learning stochastic rules (i.e. conditional probabilities over the set of labels for given instances) on the basis of statistical hypothesis testing theory, and derives bounds on sample complexities required for learning. The model deals with the problem of determining whether or not a given class of stochastic rules is probably almost discriminatively (PAD) learnable in the sense that one can discriminate, with low computational complexity and with high probability, between any pair of stochastic rules in that class by testing from which member of the pair a given test sequence has originated. In the proposed model, we construct new discrimination functions on the basis of the minimum description length (MDL) principle. We then derive upper bounds on the smallest training sample size and test sample size required by those discrimination functions in order to guarantee that for any pair of rules in a given class, two types of error probabilities are respectively less than &dgr;1 and &dgr;2 provided the distance between the two rules to be discriminated is not less than &egr;. As corollaries, we derive sample size bounds for PAD-learning of stochastic decision lists with at most k literals in each term and of stochastic decision trees with at most k log n depth (k is fixed). A sufficient condition for polynomial-time PAD learnability of any given class is also given in terms of the existence of an algorithm approximating the minimum description length.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
BC91
A.R. Barton and T. Cover. Minimum complexity density estimation. IEEE Trans. on IT, 37:1034- 1054, 1991.
 
CT91
 
Gut88
M. Gutman. Asymptotically optimal classification for multiple tests with empirically observed statistics. IEEE Trans. on IT, 35:401-408, 1988.
 
Han81
D.J. Hand. Discrimination and Classification. Wiley: New York, 1981.
 
Hoe65
W. Hoeffding. Asymptotically optimal test for multinomial distributions. Annals of Mathematical Statistics, 36:369-400, 1965.
 
KS90
M. Kearns and R. Schapire. Efficient distributionfree learning of probabilistic concepts. In Proceed. ings of IEEE the Slst FOGS, 1990.
 
Ris78
J. Rissanen. Modeling by shortest data description. Automatica, 14:465-471, 1978.
 
Ris89
 
Yam90
 
Yam91
 
Ziv88
J. Ziv. On classification with empirically observed statistics and universal data compression. IEEE Trans. IT, 34:278-286, 1988.
 
ZL78
J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. IT, 24:530-536, 1978.