| The minimum L-complexity algorithm and its applications to learning non-parametric rules |
| Full text |
Pdf
(1.13 MB)
|
| Source
|
Annual Workshop on Computational Learning Theory
archive
Proceedings of the seventh annual conference on Computational learning theory
table of contents
New Brunswick, New Jersey, United States
Pages: 173 - 182
Year of Publication: 1994
ISBN:0-89791-655-7
|
|
Author
|
|
Kenji Yamanishi
|
NEC Research Institute, Inc., 4 Independence Way, Princeton, NJ
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 24, Citation Count: 0
|
|
|
ABSTRACT
This paper proposes the minimum L-complexity algorithm (MLC), which can be thought of as an extension of the minimum description length (MDL) principle-based algorithm to the case where general real-valued functions are used as hypotheses and general loss functions are used as distortion measures. MLC is also closely related to Barron's complexity regularization algorithm and Vapnik's structural risk minimization. We demonstrate the effectiveness of MLC in terms of sample complexity within the decision theoretic PAC learning model. Specifically using MLC, we develop a unifying method of deriving upper bounds on target-dependent (non-uniform) sample complexity both for parametric and non-parametric settings. We further introduce a method for evaluating average-case sample complexity where the average is taken with respect to a prior probability over the parametric target class. These target-dependent and average-case sample complexity bounds offer a new view of sample complexity analysis, while most of previous work focusing on the worst-case sample complexity. As applications of MLC, we consider the issues of learning and non-parametric rules in terms of 1) stochastic rules with finite partitioning, 2) finite Hermite series, and 3) finite Fourier series. We use MLC to improve the previously-known best results on sample complexity for these issues.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Barron, A.R.(1991). Complexity regularization with application to artificial neural networks. Nonparametric Functional Estimation and Related Topics, G.Roussas ed., 561-576, Kluwer Academic Publishers.
|
| |
3
|
Barron, A.R. and Cover T.M.(1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory, IT-37:1034-1054.
|
| |
4
|
|
 |
5
|
|
| |
6
|
Greblicki, W.(1981). Asymptotic efficiency of classifying procedures using the Itermite series estimate of multivariate probability densities. IEEE Trans. Inform. Theory, IT-27:364-366.
|
| |
7
|
Greblicki, W. and Pawlak, M.(1981). Classification using the Fourier series estimate of multivariate density functions. IEEE Trans. Syst., Man, Cybern., SMC- 11:726-730.
|
| |
8
|
|
| |
9
|
David Haussler , Michael Kearns , Robert Schapire, Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension, Proceedings of the fourth annual workshop on Computational learning theory, p.61-74, August 05-07, 1991, Santa Cruz, California, United States
|
| |
10
|
Kearns, M. and Schapire, R.(1990). Efficient distribution free learning of probabilistic concepts. Proc. FOCS- 90, (pp. 382-391).
|
| |
11
|
Rissanen, J.(1978). Modeling by shortest data description. Automatics, Li, 465-471.
|
| |
12
|
|
| |
13
|
Schwarz, S.C.(1967). Estimation of probability density by an orthogonal series. Ann. Stat., 5:1258-1264.
|
 |
14
|
|
| |
15
|
Van Ryzin, J.(1966). Bayes risk consistency of classification procedures using density estimates. Sankhya, A, 28:161-170.
|
| |
16
|
|
| |
17
|
Wahba, G.(1975). Optimal convergence properties of variable knot, kernel, and orthogonal series methods for density estimation. Ann. Stat., 3:15-29.
|
| |
18
|
Waiter, G.G.(1977). Properties of Hermite series estimation of probability density. Ann. Stat., 5:1258-1264.
|
| |
19
|
|
| |
20
|
|
|