| Leveraging the margin more carefully |
| Full text |
Pdf
(214 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 69
archive
Proceedings of the twenty-first international conference on Machine learning
table of contents
Banff, Alberta, Canada
Page: 63
Year of Publication: 2004
ISBN:1-58113-828-5
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 32, Citation Count: 4
|
|
|
ABSTRACT
Boosting is a popular approach for building accurate classifiers. Despite the initial popular belief, boosting algorithms do exhibit overfitting and are sensitive to label noise. Part of the sensitivity of boosting algorithms to outliers and noise can be attributed to the unboundedness of the margin-based loss functions that they employ. In this paper we describe two leveraging algorithms that build on boosting techniques and employ a bounded loss function of the margin. The first algorithm interleaves the expectation maximization (EM) algorithm with boosting steps. The second algorithm decomposes a non-convex loss into a difference of two convex losses. We prove that both algorithms converge to a stationary point. We also analyze the generalization properties of the algorithms using the Rademacher complexity. We describe experiments with both synthetic data and natural data (OCR and text) that demonstrate the merits of our framework, in particular robustness to outliers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
O. Dekel, S. Shalev-Shwartz, and Y. Singer. Smooth epsilon-insensitive regression by loss symmetrization. COLT'03.
|
| |
4
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statistical Society, Ser. B, 39:1--38, 1977.
|
| |
5
|
|
| |
6
|
N. Duffy and D. Helmbold. Potential boosters? NIPS'99.
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
K. Lang. Newsweeder: Learning to filter netnews. ICML'95.
|
| |
12
|
L. Mason, J. Baxter, P. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers. 1999.
|
| |
13
|
|
| |
14
|
G. Rätsch. Robust Boosting and Convex Optimization. Doctoral dissertation, University of Potsdam, 2001.
|
| |
15
|
|
| |
16
|
|
| |
17
|
H. Tuy. 1994. D. C. optimization: Theory, Methods & Algorithms. Horst & Pardalos (Ed) Handbook of Global Opt.
|
| |
18
|
S. Wang, R. Rosenfeld, Y. Zhao, and D. Schuurmans. The latent maximum entropy principle. In IEEE International Symposium on Information Theory, 2002.
|
| |
19
|
X. Zhang X. Shen, G. C. Tseng and W. H. Wong. On psi-learning. i. J. of American Statistical Association, 2003.
|
CITED BY 4
|
|
Ronan Collobert , Fabian Sinz , Jason Weston , Léon Bottou, Trading convexity for scalability, Proceedings of the 23rd international conference on Machine learning, p.201-208, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
|
|
|
|
|
|
|
|