ACM Home Page
Please provide us with feedback. Feedback
Mismatched models, wrong results, and dreadful decisions: on choosing appropriate data mining tools
Full text MovMov (49:30),  PdfPdf (357 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Keynote talks table of contents
Pages 1-2  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Author
David J. Hand  Imperial College, London, United Kingdom
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 93,   Downloads (12 Months): 270,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557021
What is a DOI?

ABSTRACT

Data mining techniques use 'score functions' to quantify how well a model fits a given data set. Parameters are estimated by optimising the fit, as measured by the chosen score function, and model choice is guided by the size of the scores for the different models. Since different score functions summarise the fit in different ways, it is important to choose a function which matches the objectives of the data mining exercise. For predictive classification problems, a wide variety of score functions exist, including measures such as precision and recall, the F measure, misclassification rate, the area under the ROC curve (the AUC), and others. The first four of these require a 'classification threshold' to be chosen, a choice which may not be easy, or may even be impossible, especially when the classification rule is to be applied in the future. In contrast, the AUC does not require the specification of a classification threshold, but summarises performance over the range of possible threshold choices. However, unfortunately, and despite the widespread use of the AUC, it has a previously unrecognised fundamental incoherence lying at the core of its definition. This means that using the AUC can lead to poor model choice and unecessary misclassifications. The AUC is set in context, its deficiency explained and the implications illustrated - with the bottom line being that the AUC should not be used. A family of coherent alternative scores is described. The ideas are illustrated with examples from bank loans, fraud, face recognition, and health screening.