| An efficient projection for l1, ∞ regularization |
| Full text |
Pdf
(635 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 857-864
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
Ariadna Quattoni
|
Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and UC Berkeley EECS and ICSI, Berkeley, CA
|
|
Xavier Carreras
|
Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA
|
|
Michael Collins
|
Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA
|
|
Trevor Darrell
|
UC Berkeley EECS and ICSI, Berkeley, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 37, Citation Count: 0
|
|
|
ABSTRACT
In recent years the l1, ∞ norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of l1, ∞ regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the l1, ∞ ball. We present an algorithm that works in O(n log n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multi-task image annotation problem. Our results show that l1, ∞ leads to better performance than both l2 and l1 regularization and that it is is effective in discovering jointly sparse solutions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. Advances in Neural Information Processing Systems 19 (pp. 41--48).
|
| |
3
|
Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.
|
| |
4
|
Donoho, D. (2004). For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. (Technical Report). Statistics Dept., Stanford University.
|
 |
5
|
John Duchi , Shai Shalev-Shwartz , Yoram Singer , Tushar Chandra, Efficient projections onto the l1-ball for learning in high dimensions, Proceedings of the 25th international conference on Machine learning, p.272-279, July 05-09, 2008, Helsinki, Finland
[doi> 10.1145/1390156.1390191]
|
| |
6
|
|
| |
7
|
Lee, S. I., Ganapathi, V., & Koller, D. (2007). Effcient structure learning of markov networks using l1-regularization. Advances in Neural Information Processing Systems 19 (pp. 817--824).
|
| |
8
|
Meier, L., van de Geer, S., & Buhlmann, P. (2006). The group lasso for logistic regression (Technical Report). ETH Seminar fur Statistik.
|
 |
9
|
Andrew Y. Ng, Feature selection, L1 vs. L2 regularization, and rotational invariance, Proceedings of the twenty-first international conference on Machine learning, p.78, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015435]
|
| |
10
|
|
| |
11
|
Obozinski, G., Taskar, B., & Jordan, M. (2006). Multitask feature selection (Technical Report). Statistics Dept., University of California, Berkeley.
|
| |
12
|
Park, M. Y., & Hastie, T. (2006). Regularization path algorithms for detecting gene interactions (Technical Report). Stanford University.
|
| |
13
|
Quattoni, A., Collins, M., & Darrell, T. (2008). Transfer learning for image classification with sparse prototype representations. Proc. of Conf. on Computer Vision and Pattern Recognition.
|
| |
14
|
Schmidt, M., Murphy, K., Fung, G., & Rosale, R. (2008). Structure learning in random fields for heart motion abnormality detection. Proc. of Conf. on Computer Vision and Pattern Recognition.
|
| |
15
|
Schmidt, M., van den Berg, E., Friedlander, M., & Murphy, K. (2009). Optimizing costly functions with simple constraints: A limited-memory projected quasi-newton algorithm. Proc. of Conf. on Artificial Intelligence and Statistics (pp. 456--463).
|
 |
16
|
|
| |
17
|
Similä, T., & Tikka, J. (2007). Input selection and shrinkage in multiresponse linear regression. Computational Statistics and Data Analysis, 52, 406--422.
|
| |
18
|
|
| |
19
|
Turlach, B., Venables, W., & Wright, S. (2005). Simultaneous variable selection. Technometrics, 47, 349--363.
|
| |
20
|
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Royal Statistical Society Series B, 68, 49--67.
|
| |
21
|
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proc. of Intl. Conf. on Machine Learning (pp. 928--936).
|
|