| Boosting to correct inductive bias in text classification |
| Full text |
Pdf
(199 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eleventh international conference on Information and knowledge management
table of contents
McLean, Virginia, USA
SESSION: Classification
table of contents
Pages: 348 - 355
Year of Publication: 2002
ISBN:1-58113-492-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 45, Citation Count: 8
|
|
|
ABSTRACT
This paper studies the effects of boosting in the context of different classification methods for text categorization, including Decision Trees, Naive Bayes, Support Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive biases of each classifier and explore how boosting, as an error-driven resampling mechanism, reacts to those biases. Our experiments on the Reuters-21578 benchmark show that boosting is not effective in improving the performance of the base classifiers on common categories. However, the effect of boosting for rare categories varies across classifiers: for SVMs and Decision Trees, we achieved a 13-17% performance improvement in macro-averaged F1 measure, but did not obtain substantial improvement for the other two classifiers. This interesting finding of boosting on rare categories has not been reported before.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
C. Buckley, G. Salton, and J. Allan. The effect of adding relevance information in a relevance feedback environment. pages 292--300, London, 1994. Springer-Verlag.
|
| |
5
|
|
| |
6
|
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In International Conference on Machine Learning, pages 148--156, 1996.
|
| |
7
|
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting, 1998.
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In Neural Information Processin Systems (NIPS), 2001.
|
| |
12
|
D. Lewis, F. Li, T. Rose, and Y. Yang. The reuters corpus volume i as a text categorization test collection. In SIGIR 2002 (submitted).
|
 |
13
|
|
| |
14
|
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.
|
| |
15
|
T. Mitchell. Machine Learning. McGraw Hill, 1996.
|
| |
16
|
J. R. Quinlan. Bagging, boosting, and c4.5. Proceedings of the 13th National Conference on Artifitial Intelligence on Machine Learning, pages 322--330.
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
Sholom M. Weiss , Chidanand Apte , Fred J. Damerau , David E. Johnson , Frank J. Oles , Thilo Goetz , Thomas Hampp, Maximizing Text-Mining Performance, IEEE Intelligent Systems, v.14 n.4, p.63-69, July 1999
[doi> 10.1109/5254.784086]
|
| |
26
|
|
 |
27
|
|
| |
28
|
|
|