| Text categorization by boosting automatically extracted concepts |
| Full text |
Pdf
(238 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
table of contents
Toronto, Canada
SESSION: Text categorization
table of contents
Pages: 182 - 189
Year of Publication: 2003
ISBN:1-58113-646-3
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 111, Citation Count: 15
|
|
|
ABSTRACT
Term-based representations of documents have found wide-spread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard lexical semantics and, as a consequence, are not sufficiently robust with respect to variations in word usage.In this paper we investigate the use of concept-based document representations to supplement word- or phrase-based features. The utilized concepts are automatically extracted from documents via probabilistic latent semantic analysis. We propose to use AdaBoost to optimally combine weak hypotheses based on both types of features. Experimental results on standard benchmarks confirm the validity of our approach, showing that AdaBoost achieves consistent improvements by including additional semantic features in the learned ensemble.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6):391--407, 1990.
|
| |
6
|
|
| |
7
|
|
| |
8
|
T. Hofmann. Probmap -a probabilistic approach for mapping large document collections.Journal for Intelligent Data Analysis 4:149--164, 2000.
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
S. T. Dumais. Using LSI for information filtering: TREC-3 experiments. In D. Harman, editor, The Third Text REtrieval Conference (TREC3) NIST Special Publication 1995.
|
 |
13
|
|
| |
14
|
|
| |
15
|
J. Kandola, N. Cristianini, and J. Shawe-Taylor. Learning semantic similarity. In Advances in Neural Information Processing Systems (to appear) volume 15, 2003.
|
| |
16
|
T. Hofmann. Learning the similarity of documents. In MIT Press, editor, Advances in Neural Information Processing Systems volume 12, 2000.
|
| |
17
|
|
 |
18
|
|
 |
19
|
Ron Bekkerman , Ran El-Yaniv , Naftali Tishby , Yoad Winter, On feature distributional clustering for text categorization, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.146-153, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383976]
|
| |
20
|
David Lewis. Reuters-21578 dataset.
|
| |
21
|
William Hersh , Chris Buckley , T. J. Leone , David Hickam, OHSUMED: an interactive retrieval evaluation and new large test collection for research, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, p.192-201, July 03-06, 1994, Dublin, Ireland
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dou Shen , Jian-Tao Sun , Qiang Yang , Zheng Chen, Text classification improved through multigram models, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Juan Cao , Tian Xia , Jintao Li , Yongdong Zhang , Sheng Tang, A density-based method for adaptive LDA model selection, Neurocomputing, v.72 n.7-9, p.1775-1781, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|