ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Stochastic methods for l1 regularized loss minimization
Full text PdfPdf (723 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages: 929-936  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Shai Shalev-Shwartz  Toyota Technological Institute at Chicago, Chicago, IL
Ambuj Tewari  Toyota Technological Institute at Chicago, Chicago, IL
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 109,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553493
What is a DOI?

ABSTRACT

We describe and analyze two stochastic methods for l1 regularized loss minimization problems, such as the Lasso. The first method updates the weight of a single feature at each iteration while the second method updates the entire weight vector but only uses a single training example at each iteration. In both methods, the choice of feature/example is uniformly at random. Our theoretical runtime analysis suggests that the stochastic methods should outperform state-of-the-art deterministic approaches, including their deterministic counterparts, when the size of the problem is large. We demonstrate the advantage of stochastic methods by experimenting with synthetic and natural data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175.
 
2
Bottou, L. (Web Page). Stochastic gradient descent examples. http://leon.bottou.org/projects/sgd.
 
3
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. Advances in Neural Information Processing Systems 20 (pp. 161--168).
 
4
 
5
 
6
7
 
8
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32, 407--499.
 
9
Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Res. Logist. Quart., 3, 95--110.
 
10
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Regularized paths for generalized linear models via coordinate descent (Technical Report). Department of Statistics, Stanford University.
 
11
Genkin, A., Lewis, D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 291--304.
 
12
 
13
 
14
 
15
 
16
Langford, J., Li, L., & Zhang, T. (2009). Sparse online learning via truncated gradient. Advances in Neural Information Processing Systems 21 (pp. 905--912).
 
17
 
18
19
20
 
21
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58, 267--288.
 
22
Tseng, P., & Yun, S. (2009). A block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl., 140, 513--535.
 
23
Wu, T. T., & Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics, 2, 224--244.
 
24
Zhang, T. (2003). Sequential greedy approximation for certain convex optimization problems. IEEE Transaction on Information Theory, 49, 682--691.
 
25

Collaborative Colleagues:
Shai Shalev-Shwartz: colleagues
Ambuj Tewari: colleagues