| Probably almost discriminative learning |
| Full text |
Pdf
(863 KB)
|
| Source
|
Annual Workshop on Computational Learning Theory
archive
Proceedings of the fifth annual workshop on Computational learning theory
table of contents
Pittsburgh, Pennsylvania, United States
Pages: 164 - 171
Year of Publication: 1992
ISBN:0-89791-497-X
|
|
Author
|
|
Kenji Yamanishi
|
C&C Information Technology Research Laboratories., NEC Corporation, 1-1 Miyazaki 4-chome, Miyamae-ku, Kawasaki, Kanagawa, Japan
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 24, Citation Count: 3
|
|
|
ABSTRACT
This paper develops a new computational model for learning stochastic rules (i.e. conditional probabilities over the set of labels for given instances) on the basis of statistical hypothesis testing theory, and derives bounds on sample complexities required for learning. The model deals with the problem of determining whether or not a given class of stochastic rules is probably almost discriminatively (PAD) learnable in the sense that one can discriminate, with low computational complexity and with high probability, between any pair of stochastic rules in that class by testing from which member of the pair a given test sequence has originated.
In the proposed model, we construct new discrimination functions on the basis of the minimum description length (MDL) principle. We then derive upper bounds on the smallest training sample size and test sample size required by those discrimination functions in order to guarantee that for any pair of rules in a given class, two types of error probabilities are respectively less than &dgr;1 and &dgr;2 provided the distance between the two rules to be discriminated is not less than &egr;. As corollaries, we derive sample size bounds for PAD-learning of stochastic decision lists with at most k literals in each term and of stochastic decision trees with at most k log n depth (k is fixed). A sufficient condition for polynomial-time PAD learnability of any given class is also given in terms of the existence of an algorithm approximating the minimum description length.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
BC91
|
A.R. Barton and T. Cover. Minimum complexity density estimation. IEEE Trans. on IT, 37:1034- 1054, 1991.
|
| |
CT91
|
|
| |
Gut88
|
M. Gutman. Asymptotically optimal classification for multiple tests with empirically observed statistics. IEEE Trans. on IT, 35:401-408, 1988.
|
| |
Han81
|
D.J. Hand. Discrimination and Classification. Wiley: New York, 1981.
|
| |
Hoe65
|
W. Hoeffding. Asymptotically optimal test for multinomial distributions. Annals of Mathematical Statistics, 36:369-400, 1965.
|
| |
KS90
|
M. Kearns and R. Schapire. Efficient distributionfree learning of probabilistic concepts. In Proceed. ings of IEEE the Slst FOGS, 1990.
|
| |
Ris78
|
J. Rissanen. Modeling by shortest data description. Automatica, 14:465-471, 1978.
|
| |
Ris89
|
|
| |
Yam90
|
|
| |
Yam91
|
|
| |
Ziv88
|
J. Ziv. On classification with empirically observed statistics and universal data compression. IEEE Trans. IT, 34:278-286, 1988.
|
| |
ZL78
|
J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. IT, 24:530-536, 1978.
|
|