|
ABSTRACT
In many data mining domains, misclassification costs are different for different examples, in the same way that class membership probabilities are example-dependent. In these domains, both costs and probabilities are unknown for test examples, so both cost estimators and probability estimators must be learned. After discussing how to make optimal decisions given cost and probability estimates, we present decision tree and naive Bayesian learning methods for obtaining well-calibrated probability estimates. We then explain how to obtain unbiased estimators for example-dependent costs, taking into account the difficulty that in general, probabilities and costs are not independent random variables, and the training examples for which costs are known are not representative of all examples. The latter problem is called sample selection bias in econometrics. Our solution to it is based on Nobel prize-winning work due to the economist James Heckman. We show that the methods we propose perform better than MetaCost and all other known methods, in a comprehensive experimental comparison that uses the well-known, large, and challenging dataset from the KDD'98 data mining contest.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
S. D. Bay. UCI KDD archive. Department of Information and Computer Sciences, University of California, Irvine, 2000. http://kdd, its.uci.edu/.
|
| |
3
|
P. N. Bennett. Assessing the calibration of naive Bayes' posterior estimates. Technical Report CMU-CS-00-155, School of Computer Science, Carnegie Mellon University, 2000.
|
| |
4
|
|
| |
5
|
|
| |
6
|
L. Breiman, J. H. Friedman, R. A. Olsen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
|
| |
7
|
|
 |
8
|
|
| |
9
|
P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 105-112. Morgan Kanfmann Publishers, Inc., 1996.
|
| |
10
|
C. Elkan. Cost-sensitive learning and decision-making when costs are unknown. In Workshop Notes, Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning, 2000.
|
| |
11
|
C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Aug. 2001.
|
| |
12
|
|
| |
13
|
J. Heckman. Sample selection bias as a specification error. Econometrica, 47:153-161, 1979.
|
| |
14
|
E. C. Malthouse. Assessing the performance of direct marketing scoring models. Journal of Interactive Marketing, 15(1):49-62, 2001.
|
| |
15
|
|
| |
16
|
F. Provost and P. Domingos. Well-trained PETs: Improving probability estimation trees. CDER Working Paper 2000-04-1S, Stern School of Business, New York University, NY, NY 10012, 2000.
|
| |
17
|
|
| |
18
|
|
| |
19
|
P. Smyth, A. Gray, and U. Fayyad. Retrofitting decision tree classifiers using kernel density estimation. In Proceedings of the Twelfth International Conference on Machine Learning, pages 506-514. Morgan Kaufmann Publishers, Inc., 1995.
|
| |
20
|
J. R. Sobehart, R. M. Stein, V. Mikityanskaya, and L. Li. Moody's public firm risk model: A hybrid approach to modeling short term default risk. Technical report, Moody's Investors Service, Global Credit Research, 2000. Available at http://www, moodysqra, com} research/crm/53853, asp.
|
| |
21
|
K. Turner and J. Ghosh. Theoretical foundations linear and order statistics combiners for neural pattern classifiers. Technical Report TR-95-02-98, Computer and Vision Research Center, The University of Texas at Austin, 1995.
|
| |
22
|
P. Turney. Cost-sensitive learning bibliography. Institute for Information Technology, National Research Council, Ottawa, Canada, 2000. http ://extractor.iit.nrc.ca/ bibliographies/cost -sensit ive. html.
|
| |
23
|
|
CITED BY 52
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bishwaranjan Bhattacharjee , Naoki Abe , Kenneth Goldman , Bianca Zadrozny , Vamsavardhana R. Chillakuru , Marysabel del Carpio , Chid Apte, Using secure coprocessors for privacy preserving collaborative data mining and analysis, Proceedings of the 2nd international workshop on Data management on new hardware, June 25-25, 2006, Chicago, Illinois
|
|
|
|
|
|
Jesús M. Pérez , Javier Muguerza , Olatz Arbelaitz , Ibai Gurrutxaga , José I. Martín, Combining multiple class distribution modified subsamples in a single tree, Pattern Recognition Letters, v.28 n.4, p.414-422, March, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fen Xia , Yan-wu Yang , Liang Zhou , Fuxin Li , Min Cai , Daniel D. Zeng, A closed-form reduction of multi-class cost-sensitive learning to weighted multi-class learning, Pattern Recognition, v.42 n.7, p.1572-1581, July, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Junfeng Pan , Qiang Yang , Yiming Yang , Lei Li , Frances Tianyi Li , George Wenmin Li, Cost-Sensitive-Data Preprocessing for Mining Customer Relationship Management Databases, IEEE Intelligent Systems, v.22 n.1, p.46-51, January 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|