|
ABSTRACT
When constructing a classifier from labeled data, it is important not to assign too much weight to any single input feature, in order to increase the robustness of the classifier. This is particularly important in domains with nonstationary feature distributions or with input sensor failures. A common approach to achieving such robustness is to introduce regularization which spreads the weight more evenly between the features. However, this strategy is very generic, and cannot induce robustness specifically tailored to the classification task at hand. In this work, we introduce a new algorithm for avoiding single feature over-weighting by analyzing robustness using a game theoretic formalization. We develop classifiers which are optimally resilient to deletion of features in a minimax sense, and show how to construct such classifiers using quadratic programming. We illustrate the applicability of our methods on spam filtering and handwritten digit recognition tasks, where feature deletion is indeed a realistic noise model.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Cohen, S., Ruppin, E., & Dror, G. (2005). Feature selection based on the shapley value. IJCAI (pp. 665--670). Professional Book Center.
|
| |
4
|
El Ghaoui, L., Lanckriet, G., & Natsoulis, G. (2003). Robust classification with interval data (Technical Report UCB/CSD-03-1279). EECS Department, University of California, Berkeley.
|
 |
5
|
Ran Gilad-Bachrach , Amir Navot , Naftali Tishby, Margin based feature selection - theory and algorithms, Proceedings of the twenty-first international conference on Machine learning, p.43, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015352]
|
| |
6
|
Kim, S., Magnani, A., & Boyd, S. (2006). Robust fisher discriminant analysis. NIPS 18 (pp. 659--666). MIT Press.
|
| |
7
|
Krupka, E., & Tishby, N. (2006). Generalization in clustering with unobserved features. NIPS 18 (pp. 683--690). MIT Press.
|
| |
8
|
|
| |
9
|
LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Müller, U., Sackinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. ICANN (pp. 53--60).
|
| |
10
|
Schölkopf, B., & Smola, A. J. (Eds.). (2002). Learning with kernels. Cambridge, MA: MIT Press.
|
| |
11
|
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. NIPS 13 (pp. 668--674). MIT Press.
|
| |
12
|
|
|