| Feature selection, L1 vs. L2 regularization, and rotational invariance |
| Full text |
Pdf
(193 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 69
archive
Proceedings of the twenty-first international conference on Machine learning
table of contents
Banff, Alberta, Canada
Page: 78
Year of Publication: 2004
ISBN:1-58113-828-5
|
|
Author
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 26, Downloads (12 Months): 175, Citation Count: 21
|
|
|
ABSTRACT
We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L1 regularization of the parameters, the sample complexity (i.e., the number of training examples required to learn "well,") grows only logarithmically in the number of irrelevant features. This logarithmic rate matches the best known bounds for feature selection, and indicates that L1 regularized logistic regression can be effective even if there are exponentially many irrelevant features as there are training examples. We also give a lower-bound showing that any rotationally invariant algorithm---including logistic regression with L2 regularization, SVMs, and neural networks trained by backpropagation---has a worst case sample complexity that grows at least linearly in the number of irrelevant features.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bertsekas, D. P. (1982). Constrained optimization and lagrange multiplier methods. New York: Academic Press.
|
| |
2
|
Bordley, R. (1982). A multiplicative formula for aggregating probability assessments. Management Science, 28, 1137--1148.
|
| |
3
|
Breese, J. S., Heckerman, D., & Kadie, C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings of UAI 14 (pp. 43--52).
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Marlin, B. (2003). Modeling user rating profiles for collaborative filtering. Proceedings of NIPS 17.
|
| |
9
|
Marlin, B. (2004). Collaborative filtering: A machine learning perspective. Master's thesis, University of Toronto.
|
| |
10
|
|
 |
11
|
Paul Resnick , Neophytos Iacovou , Mitesh Suchak , Peter Bergstrom , John Riedl, GroupLens: an open architecture for collaborative filtering of netnews, Proceedings of the 1994 ACM conference on Computer supported cooperative work, p.175-186, October 22-26, 1994, Chapel Hill, North Carolina, United States
[doi> 10.1145/192844.192905]
|
CITED BY 21
|
|
|
|
|
|
|
|
Chris Ding , Ding Zhou , Xiaofeng He , Hongyuan Zha, R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization, Proceedings of the 23rd international conference on Machine learning, p.281-288, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
|
|
|
Rajat Raina , Alexis Battle , Honglak Lee , Benjamin Packer , Andrew Y. Ng, Self-taught learning: transfer learning from unlabeled data, Proceedings of the 24th international conference on Machine learning, p.759-766, June 20-24, 2007, Corvalis, Oregon
|
|
|
|
|
|
|
|
|
John Duchi , Shai Shalev-Shwartz , Yoram Singer , Tushar Chandra, Efficient projections onto the l1-ball for learning in high dimensions, Proceedings of the 25th international conference on Machine learning, p.272-279, July 05-09, 2008, Helsinki, Finland
|
|
|
|
|
|
|
|
|
Zenglin Xu , Rong Jin , Jieping Ye , Michael R. Lyu , Irwin King, Non-monotonic feature selection, Proceedings of the 26th Annual International Conference on Machine Learning, p.1145-1152, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
Kwangmoo Koh , Seung-Jean Kim , Stephen Boyd, A method for large-scale l1-regularized logistic regression, Proceedings of the 22nd national conference on Artificial intelligence, p.565-571, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
|
|
|
Ariadna Quattoni , Xavier Carreras , Michael Collins , Trevor Darrell, An efficient projection for l1, ∞ regularization, Proceedings of the 26th Annual International Conference on Machine Learning, p.857-864, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
Rajat Raina , Anand Madhavan , Andrew Y. Ng, Large-scale deep unsupervised learning using graphics processors, Proceedings of the 26th Annual International Conference on Machine Learning, p.873-880, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
Sun-In Lee , Honglak Lee , Pieter Abbeel , Andrew Y. Ng, EfficientL1regularized logistic regression, Proceedings of the 21st national conference on Artificial intelligence, p.401-408, July 16-20, 2006, Boston, Massachusetts
|
|
|
|
|