| Topic difference factor extraction between two document sets and its application to text categorization |
| Full text |
Pdf
(249 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Tampere, Finland
SESSION: Text Categorization
table of contents
Pages: 137 - 144
Year of Publication: 2002
ISBN:1-58113-561-0
|
|
Author
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 32, Citation Count: 1
|
|
|
ABSTRACT
To improve performance in text categorization, it is important to extract distinctive features for each class. This paper proposes topic difference factor analysis (TDFA) as a method to extract projection axes that reflect topic differences between two document sets. Suppose all sentence vectors that compose each document are projected onto projection axes. TDFA obtains the axes that maximize the ratio between the document sets as to the sum of squared projections by solving a generalized eigenvalue problem. The axes are called topic difference factors (TDF's). By applying TDFA to the document set that belongs to a given class and a set of documents that is misclassified as belonging to that class by an existent classifier, we can obtain features that take large values in the given class but small ones in other classes, as well as features that take large values in other classes but small ones in the given class. A classifier was constructed applying the above features to complement the kNN classifier. As the results, the micro averaged F1 measure for Reuters-21578 improved from 83.69 to 87.27%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The Smart Retrieval System-Experiments in Automatic Document Processing, pp. 313--323, Prentice-Hall, 1971.
|
| |
8
|
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis, John Wiley & Sons Inc., 1973.
|
| |
9
|
|
| |
10
|
G. J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition, John Wiley & Sons, Inc., 1992.
|
| |
11
|
|
| |
12
|
J. H. Friedman. Regularized Discriminant Analysis. J. Amer. Statist. Assoc. 84, pp.165--175, 1989.
|
|