|
ABSTRACT
We describe efficient algorithms for projecting a vector onto the l1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the l1-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous batch and online learning tasks. We show that variants of stochastic gradient projection methods augmented with our efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. We also show that in online settings gradient updates with l1 projections outperform the exponentiated gradient algorithm while obtaining models with high degrees of sparsity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175.
|
| |
2
|
Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.
|
| |
3
|
Candes, E. J. (2006). Compressive sampling. Proc. of the Int. Congress of Math., Madrid, Spain.
|
| |
4
|
|
| |
5
|
|
| |
6
|
Donoho, D. (2006a). Compressed sensing. Technical Report, Stanford University.
|
| |
7
|
Donoho, D. (2006b). For most large underdetermined systems of linear equations, the minimal l1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59.
|
| |
8
|
Friedman, J., Hastie, T., & Tibshirani, R. (2007). Pathwise co-ordinate optimization. Annals of Applied Statistics, 1, 302--332.
|
| |
9
|
Gafni, E., & Bertsekas, D. P. (1984). Two-metric projection methods for constrained optimization. SIAM Journal on Control and Optimization, 22, 936--964.
|
| |
10
|
Hazan, E. (2006). Approximate convex optimization by online game playing. Unpublished manuscript.
|
| |
11
|
Kim, S.-J., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). An interior-point method for large-scale l1-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 4, 606--617.
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
Andrew Y. Ng, Feature selection, L1 vs. L2 regularization, and rotational invariance, Proceedings of the twenty-first international conference on Machine learning, p.78, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015435]
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58, 267--288.
|
| |
20
|
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the Twentieth International Conference on Machine Learning.
|
CITED BY 8
|
|
Ariadna Quattoni , Xavier Carreras , Michael Collins , Trevor Darrell, An efficient projection for l1, ∞ regularization, Proceedings of the 26th Annual International Conference on Machine Learning, p.857-864, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|