ACM Home Page
Please provide us with feedback. Feedback
Using asymmetric distributions to improve text classifier probability estimates
Full text PdfPdf (282 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
SESSION: Text categorization table of contents
Pages: 111 - 118  
Year of Publication: 2003
ISBN:1-58113-646-3
Author
Paul N. Bennett  Carnegie Mellon University, Pittsburgh, PA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 53,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860457
What is a DOI?

ABSTRACT

Text classifiers that give probability estimates are more readily applicable in a variety of scenarios. For example, rather than choosing one set decision threshold, they can be used in a Bayesian risk model to issue a run-time decision which minimizes a user-specified cost function dynamically chosen at prediction time. However, the quality of the probability estimates is crucial. We review a variety of standard approaches to converting scores (and poor probability estimates) from text classifiers to high quality estimates and introduce new models motivated by the intuition that the empirical score distribution for the "extremely irrelevant", "hard to discriminate", and "obviously relevant" items are often significantly different. Finally, we analyze the experimental performance of these models over the outputs of two text classifiers. The analysis demonstrates that one of these models is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
P. N. Bennett. Assessing the calibration of naive bayes' posterior estimates. Technical Report CMU-CS-00-155, Carnegie Mellon, School of Computer Science, 2000.
 
2
P. N. Bennett. Using asymmetric distributions to improve classifier probabilities: A comparison of new and standard parametric methods. Technical Report CMU-CS-02-126, Carnegie Mellon, School of Computer Science, 2002.
 
3
H. Bourlard and N. Morgan. A continuous speech recognition system embedding mlp into hmm. In NIPS '89, 1989.
 
4
G. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78:1--3, 1950.
 
5
M. H. DeGroot and S. E. Fienberg. The comparison and evaluation of forecasters. Statistician, 32:12--22, 1983.
 
6
M. H. DeGroot and S. E. Fienberg. Comparing probability forecasters: Basic binary concepts and multivariate extensions. In P. Goel and A. Zellner, editors, Bayesian Inference and Decision Techniques. Elsevier Science Publishers B.V., 1986.
 
7
P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple bayesian classifier. In ICML '96, 1996.
 
8
9
10
 
11
 
12
I. Good. Rational decisions. Journal of the Royal Statistical Society, Series B, 1952.
 
13
 
14
S. Kotz, T. J. Kozubowski, and K. Podgorski. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhauser, 2001.
15
 
16
D. D. Lewis. Reuters-21578, distribution 1.0. http://www.daviddlewis.com/resources/ testcollections/reuters21578, January 1997.
 
17
18
 
19
D. Lindley, A. Tversky, and R. Brown. On the reconciliation of probability assessments. Journal of the Royal Statistical Society, 1979.
20
 
21
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI '98, Workshop on Learning for Text Categorization, 1998.
 
22
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. J. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 1999.
 
23
M. Saar-Tsechansky and F. Provost. Active learning for class probability estimation and ranking. In IJCAI '01, 2001.
 
24
R. L. Winkler. Scoring rules and the evaluation of probability assessors. Journal of the American Statistical Association, 1969.
25
 
26
 
27
B. Zadrozny and C. Elkan. Reducing multiclass to binary by coupling probability estimates. In KDD '02, 2002.

CITED BY  7