ACM Home Page
Please provide us with feedback. Feedback
Boosting with structural sparsity
Full text PdfPdf (651 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 297-304  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
John Duchi  University of California, Berkeley, CA
Yoram Singer  Google, Mountain View, CA
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 46,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553412
What is a DOI?

ABSTRACT

We derive generalizations of AdaBoost and related gradient-based coordinate descent methods that incorporate sparsity-promoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back-pruning through regularization and give an automatic stopping criterion for feature induction. We study penalties based on the l1, l2, and l∞ norms of the predictor and introduce mixed-norm penalties that build upon the initial penalties. The mixed-norm regularizers facilitate structural sparsity in parameter space, which is a useful property in multiclass prediction and other related tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.
 
2
 
3
 
4
 
5
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28, 337--374.
 
6
Friedman, J., Hastie, T., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1, 302--332.
 
7
 
8
 
9
 
10
Negahban, S., & Wainwright, M. (2008). Phase transitions for high-dimensional joint support recovery. Advances in Neural Information Processing Systems 22.
 
11
Obozinski, G., Taskar, B., & Jordan, M. (2007). Joint covariate selection for grouped classification (Technical Report 743). Dept. of Statistics, University of California Berkeley.
 
12
Spiegelhalter, D., & Taylor, C. (1994). Machine learning, neural and statistical classification. Ellis Horwood.
13
 
14
Zhang, T. (2008). Adaptive forward-backward greedy algorithm for sparse learning with linear models. Advances in Neural Information Processing Systems 22.
 
15
Zhang, T., & Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Annals of Statistics, 33, 1538--1579.
 
16

Collaborative Colleagues:
John Duchi: colleagues
Yoram Singer: colleagues