|
ABSTRACT
We describe and analyze two stochastic methods for l1 regularized loss minimization problems, such as the Lasso. The first method updates the weight of a single feature at each iteration while the second method updates the entire weight vector but only uses a single training example at each iteration. In both methods, the choice of feature/example is uniformly at random. Our theoretical runtime analysis suggests that the stochastic methods should outperform state-of-the-art deterministic approaches, including their deterministic counterparts, when the size of the problem is large. We demonstrate the advantage of stochastic methods by experimenting with synthetic and natural data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175.
|
| |
2
|
Bottou, L. (Web Page). Stochastic gradient descent examples. http://leon.bottou.org/projects/sgd.
|
| |
3
|
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. Advances in Neural Information Processing Systems 20 (pp. 161--168).
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
John Duchi , Shai Shalev-Shwartz , Yoram Singer , Tushar Chandra, Efficient projections onto the l1-ball for learning in high dimensions, Proceedings of the 25th international conference on Machine learning, p.272-279, July 05-09, 2008, Helsinki, Finland
[doi> 10.1145/1390156.1390191]
|
| |
8
|
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32, 407--499.
|
| |
9
|
Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Res. Logist. Quart., 3, 95--110.
|
| |
10
|
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Regularized paths for generalized linear models via coordinate descent (Technical Report). Department of Statistics, Stanford University.
|
| |
11
|
Genkin, A., Lewis, D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 291--304.
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Langford, J., Li, L., & Zhang, T. (2009). Sparse online learning via truncated gradient. Advances in Neural Information Processing Systems 21 (pp. 905--912).
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58, 267--288.
|
| |
22
|
Tseng, P., & Yun, S. (2009). A block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl., 140, 513--535.
|
| |
23
|
Wu, T. T., & Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics, 2, 224--244.
|
| |
24
|
Zhang, T. (2003). Sequential greedy approximation for certain convex optimization problems. IEEE Transaction on Information Theory, 49, 682--691.
|
| |
25
|
|
|