| Text filtering by boosting naive Bayes classifiers |
| Full text |
Pdf
(670 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Athens, Greece
Pages: 168 - 175
Year of Publication: 2000
ISBN:1-58113-226-3
|
|
Authors
|
|
Yu-Hwan Kim
|
Artificial Intelligence Lab (SCAI), School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea
|
|
Shang-Yoon Hahn
|
Artificial Intelligence Lab (SCAI), School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea
|
|
Byoung-Tak Zhang
|
Artificial Intelligence Lab (SCAI), School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 64, Citation Count: 14
|
|
|
ABSTRACT
Several machine learning algorithms have recently been used for text categorization and filtering. In particular, boosting methods such as AdaBoost have shown good performance applied to real text data. However, most of existing boosting algorithms are based on classifiers that use binary-valued features. Thus, they do not fully make use of the weight information provided by standard term weighting methods. In this paper, we present a boosting-based learning method for text filtering that uses naive Bayes classifiers as a weak learner. The use of naive Bayes allows the boosting algorithm to utilize term frequency information while maintaining probabilistically accurate confidence ratio. Applied to TREC-7 and TREC-8 filtering track documents, the proposed method obtained a significant improvement in LF1, LF2, F1 and F3 measures compared to the best results submitted by other TREC entries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
H. Drucker and C. Cortes. Boosting decision trees. In Advances in Neural Information Processing Systems 8, pp. 479-485, 1996.
|
| |
5
|
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th Int. Conf. on Machine Learning, pp. 148.-156, 1996.
|
| |
6
|
D. Hull. The TREC-8 filtering track: Description and analysis. In Proc. 7th Text Retrieval Conf. (TREC-7), pp. 33-56, 1998.
|
| |
7
|
|
| |
8
|
K. L. Kwok, L. Grunfeld, M. Chan, N. Dinstl, and C. Cool. TREC-8 ad-hoc, query and filtering track experiments using PIRCS. In Proc. Text Retrieval Conf. (TREC-8), pp. 107-116, 1998.
|
 |
9
|
David D. Lewis , Robert E. Schapire , James P. Callan , Ron Papka, Training algorithms for linear text classifiers, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.298-306, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243277]
|
 |
10
|
|
| |
11
|
|
| |
12
|
J. R. Quinlan. bagging, boosting and C4.5 In Proc. AAAI-96, pp. 725-730, 1996.
|
| |
13
|
|
| |
14
|
R. E. Schapire, Y. Freund, P. Barlett, and W.S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The annual of Statistics, 26(5):1651-1686, 1998.
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
D. K. Harman. Overview of 8th Text Retrieval Conference (TREC-8). In Proc. 8th Text Retrieval Conf. (TREC-8), pp. 1-19, 1999.
|
 |
19
|
|
CITED BY 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haoran Wu , Tong Heng Phang , Bing Liu , Xiaoli Li, A refinement approach to handling model misfit in text categorization, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|