ACM Home Page
Please provide us with feedback. Feedback
Data mining in metric space: an empirical analysis of supervised learning performance criteria
Full text PdfPdf (267 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
SESSION: Research track papers table of contents
Pages: 69 - 78  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Rich Caruana  Cornell University
Alexandru Niculescu-Mizil  Cornell University
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 98,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014063
What is a DOI?

ABSTRACT

Many criteria can be used to evaluate the performance of supervised learning. Different criteria are appropriate in different settings, and it is not always clear which criteria to use. A further complication is that learning methods that perform well on one criterion may not perform well on other criteria. For example, SVMs and boosting are designed to optimize accuracy, whereas neural nets typically optimize squared error or cross entropy. We conducted an empirical study using a variety of learning methods (SVMs, neural nets, k-nearest neighbor, bagged and boosted trees, and boosted stumps) to compare nine boolean classification performance metrics: Accuracy, Lift, F-Score, Area under the ROC Curve, Average Precision, Precision/Recall Break-Even Point, Squared Error, Cross Entropy, and Probability Calibration. Multidimensional scaling (MDS) shows that these metrics span a low dimensional manifold. The three metrics that are appropriate when predictions are interpreted as probabilities: squared error, cross entropy, and calibration, lay in one part of metric space far away from metrics that depend on the relative order of the predicted values: ROC area, average precision, break-even point, and lift. In between them fall two metrics that depend on comparing predictions to a threshold: accuracy and F-score. As expected, maximum margin methods such as SVMs and boosted trees have excellent performance on metrics like accuracy, but perform poorly on probability metrics such as squared error. What was not expected was that the margin methods have excellent performance on ordering metrics such as ROC area and average precision. We introduce a new metric, SAR, that combines squared error, accuracy, and ROC area into one metric. MDS and correlation analysis shows that SAR is centrally located and correlates well with other metrics, suggesting that it is a good general purpose metric to use when more specific criteria are not known.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. Blake and C. Merz. UCI repository of machine learning databases, 1998.
 
2
M. DeGroot and S. Fienberg. The comparison and evaluation of forecasters. Statistician, 32(1):12--22, 1982.
 
3
P. Giudici. Applied Data Mining. John Wiley and Sons, New York, 2003.
 
4
A. Gualtieri, S. R. Chettri, R. Cromp, and L. Johnson. Support vector machine classifiers as applied to aviris data. In Proc. Eighth JPL Airborne Geoscience Workshop, 1999.
 
5
T. Joachims. Making large-scale svm learning practical. In Advances in Kernel Methods, 1999.
 
6
R. King, C. Feng, and A. Shutherland. Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3):259--287, May/June 1995.
 
7
P.A. Flach. The geometry of roc space: understanding machine learning metrics through roc isometrics. In Proc. 20th International Conference on Machine Learning (ICML'03), pages 194--201. AAAI Press, January 2003.
 
8
J. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In A. Smola, P. Bartlett, B. Schoelkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61--74, 1999.
 
9
 
10
F. J. Provost and T. Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Knowledge Discovery and Data Mining, pages 43--48, 1997.

CITED BY  9

Collaborative Colleagues:
Rich Caruana: colleagues
Alexandru Niculescu-Mizil: colleagues