|
ABSTRACT
In this article we propose a moment-based method for studying models and model selection measures. By focusing on the probabilistic space of classifiers induced by the classification algorithm rather than on that of datasets, we obtain efficient characterizations for computing the moments, which is followed by visualization of the resulting formulae that are too complicated for direct interpretation. By assuming the data to be drawn independently and identically distributed from the underlying probability distribution, and by going over the space of all possible datasets, we establish general relationships between the generalization error, hold-out-set error, cross-validation error, and leave-one-out error. We later exemplify the method and the results by studying the behavior of the errors for the naive Bayes classifier.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Bertsimas, D. and Popescu, I. 1998. Optimal inequalities in probability theory: A convex optimization approach. Tech. rep., Department of Mathematics, O.R., Cambridge, Massachusetts, 02139.
|
 |
3
|
Avrim Blum , Adam Kalai , John Langford, Beating the hold-out: bounds for K-fold and progressive cross-validation, Proceedings of the twelfth annual conference on Computational learning theory, p.203-208, July 07-09, 1999, Santa Cruz, California, United States
[doi> 10.1145/307400.307439]
|
| |
4
|
Boucheron, S., Bousquet, O., and Lugosi, G. 2005. Introduction to statistical learning theory. http://www.kyb.mpg.de/publications/pdfs/pdf2819.pdf.
|
| |
5
|
|
| |
6
|
Breiman, L. 1996. Heuristics of instability and stabilization in model selection. Ann. Statis. 24, 6, 2350--2383.
|
| |
7
|
Butler, R. and Sutton, R. 1998. Saddlepoint approximation for multivariate cumulative distribution functions and probability computations in sampling theory and outlier testing. J. Amer. Statis. Assoc. 93, 442, 596--604.
|
| |
8
|
Connor-Linton, J. 2003. Chi square tutorial. http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html.
|
| |
9
|
Devroye, L., Györfi, L., and Lugosi, G. 1996. A Probabilistic Theory of Pattern Recognition. Springer.
|
| |
10
|
Doob, J. 1994. Measure Theory. Springer.
|
| |
11
|
Elisseeff, A. and Pontil, M. 2003. Leave-one-out error and stability of learning algorithms with applications. In Learning Theory and Practice. IOS Press.
|
| |
12
|
|
| |
13
|
Hall, P. 1992. The Bootstrap and Edgeworth Expansion. Springer.
|
| |
14
|
Isii, K. 1960. The extrema of probability determined by generalized moments(i) bounded random variables. Ann. Inst. Stat. Math 12, 119--133.
|
| |
15
|
Isii, K. 1963. On the sharpeness of chebyshev-type inequalities. Ann. Inst. Stat. Math 14, 185--197.
|
| |
16
|
Karlin, S. and Shapely, L. 1953. Geometry of moment spaces. Memoirs Amer. Math. Soc. 12.
|
 |
17
|
|
| |
18
|
Kohavi, R. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 1137--1143.
|
| |
19
|
Kutin, S. and Niyogi, P. 2002. Almost-Everywhere algorithmic stability and generalization error. In Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence (UAI-02). Morgan Kaufmann Publishers, 275--282.
|
| |
20
|
Langford, J. 2005. Filed under: Prediction theory, problems. http://hunch.net/index.php?p=29.
|
| |
21
|
|
| |
22
|
Levin, B. 1981. A representation for multinomial cumulative distribution functions. Ann. Statis. 9, 5, 1123--1126.
|
| |
23
|
Moore, A. and Lee, M. 1994. Efficient algorithms for minimizing cross validation error. In Proceedings of the International Conference on Machine Learning, 190--198.
|
| |
24
|
Plutowski, M. 1996. Survey: Cross-validation in theory and in practice. www.emotivate.com/CvSurvey.doc.
|
| |
25
|
Prekopa, A. 1989. The discrete moment problem and linear programming. RUTCOR Res. rep.
|
| |
26
|
Shao, J. 1993. Linear model selection by cross validation. J. Amer. Statis. Assoc. 88.
|
| |
27
|
Vapnik, V. 1998. Statistical Learning Theory. Wiley & Sons.
|
| |
28
|
Williamson, R. 2001. Srm and vc theory (statistical learning theory). http://axiom.anu.edu.au/williams/papers/P151.pdf.
|
| |
29
|
Wolfram-Research. 2009. Mathematica. http://www.wolfram.com/.
|
| |
30
|
Wu, S.-P. and Boyd, S. 1996. Sdpsol: A parser/solver for sdp and maxdet problems with matrix structure. http://www.stanford.edu/boyd/SDPSOL.html.
|
| |
31
|
|
|