| Boosting with structural sparsity |
| Full text |
Pdf
(651 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 297-304
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 46, Citation Count: 0
|
|
|
ABSTRACT
We derive generalizations of AdaBoost and related gradient-based coordinate descent methods that incorporate sparsity-promoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back-pruning through regularization and give an automatic stopping criterion for feature induction. We study penalties based on the l1, l2, and l∞ norms of the predictor and introduce mixed-norm penalties that build upon the initial penalties. The mixed-norm regularizers facilitate structural sparsity in parameter space, which is a useful property in multiclass prediction and other related tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28, 337--374.
|
| |
6
|
Friedman, J., Hastie, T., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1, 302--332.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
Negahban, S., & Wainwright, M. (2008). Phase transitions for high-dimensional joint support recovery. Advances in Neural Information Processing Systems 22.
|
| |
11
|
Obozinski, G., Taskar, B., & Jordan, M. (2007). Joint covariate selection for grouped classification (Technical Report 743). Dept. of Statistics, University of California Berkeley.
|
| |
12
|
Spiegelhalter, D., & Taylor, C. (1994). Machine learning, neural and statistical classification. Ellis Horwood.
|
 |
13
|
|
| |
14
|
Zhang, T. (2008). Adaptive forward-backward greedy algorithm for sparse learning with linear models. Advances in Neural Information Processing Systems 22.
|
| |
15
|
Zhang, T., & Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Annals of Statistics, 33, 1538--1579.
|
| |
16
|
|
|