|
ABSTRACT
Text classifiers that give probability estimates are more readily applicable in a variety of scenarios. For example, rather than choosing one set decision threshold, they can be used in a Bayesian risk model to issue a run-time decision which minimizes a user-specified cost function dynamically chosen at prediction time. However, the quality of the probability estimates is crucial. We review a variety of standard approaches to converting scores (and poor probability estimates) from text classifiers to high quality estimates and introduce new models motivated by the intuition that the empirical score distribution for the "extremely irrelevant", "hard to discriminate", and "obviously relevant" items are often significantly different. Finally, we analyze the experimental performance of these models over the outputs of two text classifiers. The analysis demonstrates that one of these models is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. N. Bennett. Assessing the calibration of naive bayes' posterior estimates. Technical Report CMU-CS-00-155, Carnegie Mellon, School of Computer Science, 2000.
|
| |
2
|
P. N. Bennett. Using asymmetric distributions to improve classifier probabilities: A comparison of new and standard parametric methods. Technical Report CMU-CS-02-126, Carnegie Mellon, School of Computer Science, 2002.
|
| |
3
|
H. Bourlard and N. Morgan. A continuous speech recognition system embedding mlp into hmm. In NIPS '89, 1989.
|
| |
4
|
G. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78:1--3, 1950.
|
| |
5
|
M. H. DeGroot and S. E. Fienberg. The comparison and evaluation of forecasters. Statistician, 32:12--22, 1983.
|
| |
6
|
M. H. DeGroot and S. E. Fienberg. Comparing probability forecasters: Basic binary concepts and multivariate extensions. In P. Goel and A. Zellner, editors, Bayesian Inference and Decision Techniques. Elsevier Science Publishers B.V., 1986.
|
| |
7
|
P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple bayesian classifier. In ICML '96, 1996.
|
| |
8
|
|
 |
9
|
|
 |
10
|
Susan Dumais , John Platt , David Heckerman , Mehran Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the seventh international conference on Information and knowledge management, p.148-155, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288651]
|
| |
11
|
|
| |
12
|
I. Good. Rational decisions. Journal of the Royal Statistical Society, Series B, 1952.
|
| |
13
|
|
| |
14
|
S. Kotz, T. J. Kozubowski, and K. Podgorski. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhauser, 2001.
|
 |
15
|
|
| |
16
|
D. D. Lewis. Reuters-21578, distribution 1.0. http://www.daviddlewis.com/resources/ testcollections/reuters21578, January 1997.
|
| |
17
|
|
 |
18
|
David D. Lewis , Robert E. Schapire , James P. Callan , Ron Papka, Training algorithms for linear text classifiers, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.298-306, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243277]
|
| |
19
|
D. Lindley, A. Tversky, and R. Brown. On the reconciliation of probability assessments. Journal of the Royal Statistical Society, 1979.
|
 |
20
|
|
| |
21
|
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI '98, Workshop on Learning for Text Categorization, 1998.
|
| |
22
|
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. J. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 1999.
|
| |
23
|
M. Saar-Tsechansky and F. Provost. Active learning for class probability estimation and ranking. In IJCAI '01, 2001.
|
| |
24
|
R. L. Winkler. Scoring rules and the evaluation of probability assessors. Journal of the American Statistical Association, 1969.
|
 |
25
|
|
| |
26
|
|
| |
27
|
B. Zadrozny and C. Elkan. Reducing multiclass to binary by coupling probability estimates. In KDD '02, 2002.
|
|