|
ABSTRACT
We consider feature selection for text classification both theoretically and empirically. Our main result is an unsupervised feature selection strategy for which we give worst-case theoretical guarantees on the generalization power of the resultant classification function f with respect to the classification function f obtained when keeping all the features. To the best of our knowledge, this is the first feature selection method with such guarantees. In addition, the analysis leads to insights as to when and why this feature selection strategy will perform well in practice. We then use the TechTC-100, 20-Newsgroups, and Reuters-RCV2 data sets to evaluate empirically the performance of this and two simpler but related feature selection strategies against two commonly-used strategies. Our empirical evaluation shows that the strategy with provable performance guarantees performs well in comparison with other commonly-used feature selection strategies. In addition, it performs better on certain datasets under very aggressive feature selection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
20 Newsgroups Dataset. J. Rennie. http://people.csail.mit.edu/jrennie/20Newsgroups/.
|
| |
2
|
20 Newsgroups Dataset. UCI KDD Archive. http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
A. Das and D. Kempe. Algorithms for subset selection in linear regression. Manuscript.
|
| |
8
|
Anirban Dasgupta , Petros Drineas , Boulos Harb , Ravi Kumar , Michael W. Mahoney, Sampling algorithms and coresets for ℓp regression, Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, p.932-941, January 20-22, 2008, San Francisco, California
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13(1):1--50, 1999.
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
K. Lang. Newsweeder: Learning to filter netnews. In Proceedings of the 12th International Conference on Machine Learning, pages 331--339, 1995.
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
T. Poggio and S. Smale. The mathematics of learning: Dealing with data. Notices of the AMS, 50(5):537--544, May 2003.
|
| |
26
|
|
| |
27
|
R. Rifkin, G. Yeo, and T. Poggio. Regularized least-squares classification. In J. A. K. Suykens, G. Horvath, S. Basu, C. Micchelli, and J. Vandewalle, editors, Advances in Learning Theory: Methods, Models and Applications, NATO Science Series III: Computer and Systems Sciences, pages 131--154. VIOS Press, 2003.
|
| |
28
|
M. Rudelson and R. Vershynin. Sampling from large matrices: an approach through geometric functional analysis. Manuscript. Salton. Proceedings of the 14th Annual International ACM SIGIR Conference, pages 356--358, 1991.
|
| |
29
|
|
 |
30
|
|
| |
31
|
G. W. Stewart and J. G. Sun. Matrix Perturbation Theory. Academic Press, New York, 1990.
|
| |
32
|
|
| |
33
|
A. N. Tikhonov and V. Y. Arsenin. Solutions of Ill-Posed Problems. W. H. Winston, Washington, D. C., 1977.
|
| |
34
|
V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
|
 |
35
|
|
| |
36
|
Y. Yang. Sampling strategies and learning efficiency in text categorization. In AAAI Spring Symposium on Machine Learning in Information Access, pages 88--95, 1996.
|
 |
37
|
|
| |
38
|
|
 |
39
|
|
| |
40
|
|
| |
41
|
|
CITED BY 4
|
|
|
|
|
|
|
|
Ou Wu , Haiqiang Zuo , Mingliang Zhu , Weiming Hu , Jun Gao , Hanzi Wang, Rank Aggregation Based Text Feature Selection, Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, p.165-172, September 15-18, 2009
|
|
|
|
|