ACM Home Page
Please provide us with feedback. Feedback
Text filtering by boosting naive Bayes classifiers
Full text PdfPdf (670 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Athens, Greece
Pages: 168 - 175  
Year of Publication: 2000
ISBN:1-58113-226-3
Authors
Yu-Hwan Kim  Artificial Intelligence Lab (SCAI), School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea
Shang-Yoon Hahn  Artificial Intelligence Lab (SCAI), School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea
Byoung-Tak Zhang  Artificial Intelligence Lab (SCAI), School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea
Sponsors
Athens U of Econ & Business : Athens University of Economics and Business
Greek Com Soc : Greek Computer Society
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 64,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/345508.345572
What is a DOI?

ABSTRACT

Several machine learning algorithms have recently been used for text categorization and filtering. In particular, boosting methods such as AdaBoost have shown good performance applied to real text data. However, most of existing boosting algorithms are based on classifiers that use binary-valued features. Thus, they do not fully make use of the weight information provided by standard term weighting methods. In this paper, we present a boosting-based learning method for text filtering that uses naive Bayes classifiers as a weak learner. The use of naive Bayes allows the boosting algorithm to utilize term frequency information while maintaining probabilistically accurate confidence ratio. Applied to TREC-7 and TREC-8 filtering track documents, the proposed method obtained a significant improvement in LF1, LF2, F1 and F3 measures compared to the best results submitted by other TREC entries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
H. Drucker and C. Cortes. Boosting decision trees. In Advances in Neural Information Processing Systems 8, pp. 479-485, 1996.
 
5
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th Int. Conf. on Machine Learning, pp. 148.-156, 1996.
 
6
D. Hull. The TREC-8 filtering track: Description and analysis. In Proc. 7th Text Retrieval Conf. (TREC-7), pp. 33-56, 1998.
 
7
 
8
K. L. Kwok, L. Grunfeld, M. Chan, N. Dinstl, and C. Cool. TREC-8 ad-hoc, query and filtering track experiments using PIRCS. In Proc. Text Retrieval Conf. (TREC-8), pp. 107-116, 1998.
9
10
 
11
 
12
J. R. Quinlan. bagging, boosting and C4.5 In Proc. AAAI-96, pp. 725-730, 1996.
 
13
 
14
R. E. Schapire, Y. Freund, P. Barlett, and W.S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The annual of Statistics, 26(5):1651-1686, 1998.
15
16
17
 
18
D. K. Harman. Overview of 8th Text Retrieval Conference (TREC-8). In Proc. 8th Text Retrieval Conf. (TREC-8), pp. 1-19, 1999.
19

CITED BY  14
 
 
 
 
 
 
 
 
 

Collaborative Colleagues:
Yu-Hwan Kim: colleagues
Shang-Yoon Hahn: colleagues
Byoung-Tak Zhang: colleagues

Peer to Peer - Readers of this Article have also read: