ACM Home Page
Please provide us with feedback. Feedback
An efficient projection for l1, regularization
Full text PdfPdf (635 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 857-864  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Ariadna Quattoni  Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and UC Berkeley EECS and ICSI, Berkeley, CA
Xavier Carreras  Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA
Michael Collins  Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA
Trevor Darrell  UC Berkeley EECS and ICSI, Berkeley, CA
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 37,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553484
What is a DOI?

ABSTRACT

In recent years the l1, norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of l1, regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the l1, ball. We present an algorithm that works in O(n log n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multi-task image annotation problem. Our results show that l1, leads to better performance than both l2 and l1 regularization and that it is is effective in discovering jointly sparse solutions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. Advances in Neural Information Processing Systems 19 (pp. 41--48).
 
3
Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.
 
4
Donoho, D. (2004). For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. (Technical Report). Statistics Dept., Stanford University.
5
 
6
 
7
Lee, S. I., Ganapathi, V., & Koller, D. (2007). Effcient structure learning of markov networks using l1-regularization. Advances in Neural Information Processing Systems 19 (pp. 817--824).
 
8
Meier, L., van de Geer, S., & Buhlmann, P. (2006). The group lasso for logistic regression (Technical Report). ETH Seminar fur Statistik.
9
 
10
 
11
Obozinski, G., Taskar, B., & Jordan, M. (2006). Multitask feature selection (Technical Report). Statistics Dept., University of California, Berkeley.
 
12
Park, M. Y., & Hastie, T. (2006). Regularization path algorithms for detecting gene interactions (Technical Report). Stanford University.
 
13
Quattoni, A., Collins, M., & Darrell, T. (2008). Transfer learning for image classification with sparse prototype representations. Proc. of Conf. on Computer Vision and Pattern Recognition.
 
14
Schmidt, M., Murphy, K., Fung, G., & Rosale, R. (2008). Structure learning in random fields for heart motion abnormality detection. Proc. of Conf. on Computer Vision and Pattern Recognition.
 
15
Schmidt, M., van den Berg, E., Friedlander, M., & Murphy, K. (2009). Optimizing costly functions with simple constraints: A limited-memory projected quasi-newton algorithm. Proc. of Conf. on Artificial Intelligence and Statistics (pp. 456--463).
16
 
17
Similä, T., & Tikka, J. (2007). Input selection and shrinkage in multiresponse linear regression. Computational Statistics and Data Analysis, 52, 406--422.
 
18
 
19
Turlach, B., Venables, W., & Wright, S. (2005). Simultaneous variable selection. Technometrics, 47, 349--363.
 
20
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Royal Statistical Society Series B, 68, 49--67.
 
21
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proc. of Intl. Conf. on Machine Learning (pp. 928--936).

Collaborative Colleagues:
Ariadna Quattoni: colleagues
Xavier Carreras: colleagues
Michael Collins: colleagues
Trevor Darrell: colleagues