|
ABSTRACT
In applications of data mining characterized by highly skewed misclassification costs certain types of errors become virtually unacceptable. This limits the utility of a classifier to a range in which such constraints can be met. Naive Bayes, which has proven to be very useful in text mining applications due to high scalability, can be particularly affected. Although its 0/1 loss tends to be small, its misclassifications are often made with apparently high confidence. Aside from efforts to better calibrate Naive Bayes scores, it has been shown that its accuracy depends on document sparsity and feature selection can lead to marked improvement in classification performance. Traditionally, sparsity is controlled globally, and the result for any particular document may vary. In this work we examine the merits of local sparsity control for Naive Bayes in the context of highly asymmetric misclassification costs. In experiments with three benchmark document collections we demonstrate clear advantages of document-level feature selection. In the extreme cost setting, multinomial Naive Bayes with local sparsity control is able to outperform even some of the recently proposed effective improvements to the Naive Bayes classifier. There are also indications that local feature selection may be preferable in different cost settings.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
P. N. Bennett. Assessing the calibration of Naive Bayes posterior estimates. Technical Report CMU-CS-00-155, Computer Science Department, School of Computer Science, Carnegie Mellon University, 2000.
|
 |
3
|
|
 |
4
|
|
| |
5
|
A. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145--1159, 1997.
|
| |
6
|
R. Caruana and A. Niculescu-Mizil. Predicting good probabilities with supervised learning. In Proceedings of the American Meteorology Conference (AMS2005), 2005.
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 973--978, 2001.
|
| |
13
|
|
| |
14
|
|
| |
15
|
P. Graham. A plan for spam, 2002. Available from World Wide Web: http://www.paulgraham.com/spam.html.
|
| |
16
|
|
| |
17
|
M. Kukar. Transductive reliability estimation for medical diagnosis. Artificial Intelligene in Medicine, 29:81--106, 2003.
|
| |
18
|
|
 |
19
|
David D. Lewis , Robert E. Schapire , James P. Callan , Ron Papka, Training algorithms for linear text classifiers, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.298-306, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243277]
|
| |
20
|
|
| |
21
|
A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, 1998.
|
 |
22
|
|
| |
23
|
|
 |
24
|
Dmitry Pavlov , Ramnath Balasubramanyan , Byron Dom , Shyam Kapur , Jignashu Parikh, Document preprocessing for naive Bayes classification and clustering with mixture of multinomials, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1016922]
|
| |
25
|
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 1999.
|
| |
26
|
F. Provost. Learning with imbalanced data sets 101. In Proceedings of the AAAI'2000 Workshop on Imbalanced Data Sets, 2000.
|
| |
27
|
|
| |
28
|
J. Rennie, L. Shih, J. Teevan, and D. Karger. Tackling the poor assumptions of Naive Bayes text classifiers. In Proceedings of the Twentieth International Conference on Machine Learning, 2003.
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
 |
33
|
|
 |
34
|
|
|