ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Boosting to correct inductive bias in text classification
Full text PdfPdf (199 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the eleventh international conference on Information and knowledge management table of contents
McLean, Virginia, USA
SESSION: Classification table of contents
Pages: 348 - 355  
Year of Publication: 2002
ISBN:1-58113-492-4
Authors
Yan Liu  Carnegie Mellon University, Pittsburgh, PA
Yiming Yang  Carnegie Mellon University, Pittsburgh, PA
Jaime Carbonell  Carnegie Mellon University, Pittsburgh, PA
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 45,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/584792.584850
What is a DOI?

ABSTRACT

This paper studies the effects of boosting in the context of different classification methods for text categorization, including Decision Trees, Naive Bayes, Support Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive biases of each classifier and explore how boosting, as an error-driven resampling mechanism, reacts to those biases. Our experiments on the Reuters-21578 benchmark show that boosting is not effective in improving the performance of the base classifiers on common categories. However, the effect of boosting for rare categories varies across classifiers: for SVMs and Decision Trees, we achieved a 13-17% performance improvement in macro-averaged F1 measure, but did not obtain substantial improvement for the other two classifiers. This interesting finding of boosting on rare categories has not been reported before.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
C. Buckley, G. Salton, and J. Allan. The effect of adding relevance information in a relevance feedback environment. pages 292--300, London, 1994. Springer-Verlag.
 
5
 
6
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In International Conference on Machine Learning, pages 148--156, 1996.
 
7
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting, 1998.
 
8
 
9
 
10
 
11
G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In Neural Information Processin Systems (NIPS), 2001.
 
12
D. Lewis, F. Li, T. Rose, and Y. Yang. The reuters corpus volume i as a text categorization test collection. In SIGIR 2002 (submitted).
13
 
14
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.
 
15
T. Mitchell. Machine Learning. McGraw Hill, 1996.
 
16
J. R. Quinlan. Bagging, boosting, and c4.5. Proceedings of the 13th National Conference on Artifitial Intelligence on Machine Learning, pages 322--330.
 
17
18
 
19
 
20
 
21
22
23
 
24
 
25
 
26
27
 
28

CITED BY  8

Collaborative Colleagues:
Yan Liu: colleagues
Yiming Yang: colleagues
Jaime Carbonell: colleagues