|
ABSTRACT
We present tight surrogate regret bounds for the class of proper (i.e., Fisher consistent) losses. The bounds generalise the margin-based bounds due to Bartlett et al. (2006). The proof uses Taylor's theorem and leads to new representations for loss and regret and a simple proof of the integral representation of proper losses. We also present a different formulation of a duality result of Bregman divergences which leads to a simple demonstration of the convexity of composite losses using canonical link functions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Bartlett, P., Jordan, M., & McAuliffe, J. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101, 138--156.
|
| |
3
|
Beygelzimer, A., Langford, J., & Zadrozny, B. (2008). Machine learning techniques --- reductions between prediction quality metrics. Preprint.
|
| |
4
|
Buja, A., Stuetzle, W., & Shen, Y. (2005). Loss functions for binary class probability estimation and classification: Structure and applications (Technical Report). University of Pennsylvania.
|
| |
5
|
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359--378.
|
| |
6
|
Helmbold, D., Kivinen, J., & Warmuth, M. (1999). Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10, 1291--1304.
|
| |
7
|
Hiriart-Urruty, J.-B., & Lemarééchal, C. (2001). Fundamentals of convex analysis. Berlin: Springer.
|
 |
8
|
|
| |
9
|
Langford, J., & Zadrozny, B. (2005). Estimating class membership probabilities using classifier learners. Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTAT'05).
|
| |
10
|
Liese, F., & Vajda, I. (2006). On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52, 4394--4412.
|
| |
11
|
McCullagh, P., & Nelder, J. (1989). Generalized linear models. Chapman & Hall/CRC.
|
| |
12
|
Reid, M. D., & Williamson, R. C. (2009). Information, divergence and risk for binary experiments. arXiv preprint arXiv:0901.0356v1, 89 pages.
|
| |
13
|
Savage, L. J. (1971). Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66, 783--801.
|
| |
14
|
Schervish, M. (1989). A general method for comparing probability assessors. The Annals of Statistics, 17, 1856--1879.
|
| |
15
|
Shuford, E., Albert, A., & Massengill, H. (1966). Admissible probability measurement procedures. Psychometrika, 31, 125--145.
|
| |
16
|
Steinwart, I. (2007). How to compare different loss functions and their risks. Constructive Approximation, 26, 225--287.
|
| |
17
|
|
| |
18
|
Zhang, T. (2004b). Statistical behaviour and consistency of classification methods based on convex risk minimization. Annals of Mathematical Statistics, 32, 56--134.
|
|