| Probabilistic score estimation with piecewise logistic regression |
| Full text |
Pdf
(309 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 69
archive
Proceedings of the twenty-first international conference on Machine learning
table of contents
Banff, Alberta, Canada
Page: 115
Year of Publication: 2004
ISBN:1-58113-828-5
|
|
Authors
|
|
Jian Zhang
|
Carnegie Mellon University, Pittsburgh, PA
|
|
Yiming Yang
|
Carnegie Mellon University, Pittsburgh, PA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 27, Citation Count: 2
|
|
|
ABSTRACT
Well-calibrated probabilities are necessary in many applications like probabilistic frameworks or cost-sensitive tasks. Based on previous success of asymmetric Laplace method in calibrating text classifiers' scores, we propose to use piecewise logistic regression, which is a simple extension of standard logistic regression, as an alternative method in the discriminative family. We show that both methods have the flexibility to be piecewise linear functions in log-odds, but they are based on quite different assumptions. We evaluated asymmetric Laplace method, piecewise logistic regression and standard logistic regression over standard text categorization collections (Reuters-21578 and TRECAP) with three classifiers (SVM, Naive Bayes and Logistic Regression Classifier), and observed that piecewise logistic regression performs significantly better than the other two methods in the log-loss metric.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Domingos, P., & Pazzani, M. (1996). Beyond independence: Conditions for the optimality of the simple bayesian classifier. ICML'96.
|
| |
3
|
|
| |
4
|
Hastie, T., & Tibshirani, R. (1996). Generalized Additive Model. Statistical Sciences, vol 1:297--318.
|
| |
5
|
Hastie, T., & Tibshirani, R. (1996). Classification by pairwise coupling. Technical Report, Stanford University and University of Toronto.
|
| |
6
|
|
| |
7
|
Kotz, S., Kozubowski T., & Podgorski K. (2001). The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Brikhauser.
|
 |
8
|
David D. Lewis , Robert E. Schapire , James P. Callan , Ron Papka, Training algorithms for linear text classifiers, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.298-306, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243277]
|
| |
9
|
|
| |
10
|
McCallum, A. (1996). Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow.
|
| |
11
|
McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. AAAI'98 Workshop on Learning for Text Categorization.
|
| |
12
|
Ng, A., & Jordan M. (2002). On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes. Proceedings of NIPS 14.
|
| |
13
|
Platt. J. (1999). Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, MIT Press.
|
| |
14
|
Rubinstein, Y., & Hastie, T. (1997). Discriminative vs. informative learning.p Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 49--53. AAAI Press, 1997.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
|