ACM Home Page
Please provide us with feedback. Feedback
Margin based feature selection - theory and algorithms
Full text PdfPdf (275 KB)
Source ACM International Conference Proceeding Series; Vol. 69 archive
Proceedings of the twenty-first international conference on Machine learning table of contents
Banff, Alberta, Canada
Page: 43  
Year of Publication: 2004
ISBN:1-58113-828-5
Authors
Ran Gilad-Bachrach  The Hebrew University, Jerusalem, Israel
Amir Navot  The Hebrew University, Jerusalem, Israel
Naftali Tishby  The Hebrew University, Jerusalem, Israel
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 98,   Citation Count: 21
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1015330.1015352
What is a DOI?

ABSTRACT

Feature selection is the task of choosing a small set out of a given set of features that capture the relevant properties of the data. In the context of supervised classification problems the relevance is determined by the given labels on the training data. A good choice of features is a key for building compact and accurate classifiers. In this paper we introduce a margin based feature selection criterion and apply it to measure the quality of sets of features. Using margins we devise novel selection algorithms for multi-class classification problems and provide theoretical generalization bound. We also study the well known Relief algorithm and show that it resembles a gradient ascent over our margin criterion. We apply our new algorithm to various datasets and show that our new Simba algorithm, which directly optimizes the margin, outperforms Relief.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bartlett, P. (1998). The size of the wieghts is more important than the size of the network. IEEE Transactions on Information Theory, 44, 525--536.
 
2
Bellman, R. (1961). Adaptive control processes: A guided tour. Princeton University Press.
 
3
 
4
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classifier. IEEE Trans. on Information Theory, 13.
 
5
Crammer, K., Gilad-Bachrach, R., Navot, A., & Tishby, N. (2002). Margin analysis of the lvq algorithm. Proc. 17'th Conference on Neural Information Processing Systems.
 
6
Fix, E., & Hodges, j. (1951). Discriminatory analysis. nonparametric discrimination: Consistency properties (Technical Report 4). USAF school of Aviation Medicine.
 
7
 
8
 
9
 
10
Guyon, I., & Gunn, S. (2003). Nips feature selection challenge. http://www.nipsfsc.ecs.soton.ac.uk/.
 
11
Jolliffee, I. (1986). Principal component analysis. Springer Varlag.
 
12
 
13
 
14
 
15
Martinez, A., & Benavente, R. (1998). The ar face database (Technical Report). CVC Tech. Rep. #24.
 
16
 
17
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290.
 
18
Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics.
 
19
Shawe-Taylor, J., Bartlett, P., Williamson, R., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE transactions on Information Theory, 44, 1926--1940.
 
20
Tishby, N., Pereira, F., & Bialek, W. (1999). The information bottleneck method. Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing (pp. 368--377).
 
21
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. Proc. 15th Conference on Neural Information Processing Systems (NIPS) (pp. 668--674).

CITED BY  21
Collaborative Colleagues:
Ran Gilad-Bachrach: colleagues
Amir Navot: colleagues
Naftali Tishby: colleagues