ACM Home Page
Please provide us with feedback. Feedback
Variance based classifier comparison in text catergorization (poster session)
Full text PdfPdf (188 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Athens, Greece
Pages: 316 - 317  
Year of Publication: 2000
ISBN:1-58113-226-3
Authors
Atsuhiro Takasu  National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
Kenro Aihara  National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
Sponsors
Athens U of Econ & Business : Athens University of Economics and Business
Greek Com Soc : Greek Computer Society
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 28,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/345508.345618
What is a DOI?

ABSTRACT

Text categorization is one of the key functions for utilizing vast amount of documents. It can be seen as a classification problem, which has been studied in pattern recognition and machine learning fields for a long time and several classification methods have been developed such as statistical classification, decision tree, support vector machines and so on. Many researchers applied those classification methods to text categorization and reported their performance (e.g., decision tree[3], Bayes classifier[2], support vector machine[l]). Yang conducted comprehensive study of comparison or text categorization and reported that k nearest neighbor and support vector machines works well for text categorization[4].

In the previous studies, classification methods were usually compared using single pair of training and test data However, classification method with more complex family of classifiers requires more training data and small training data may result in deriving unreliable classifier, that is, the performance of the derived classifier varies much depending on training data. Therefore, we need to take the size of training data into account when comparing and selecting a classification method. In this paper, we discuss how to select a classifier from those derived by various classification methods and how the size of training data affects the performance of the derived classifier.

In order to evaluate the reliability of classification method, we consider the variance of accuracy of derived classifier. We first construct a statistical model. In the text categorization, each document is usually represented with a feature vector that consists of weighted frequencies of terms. In the vector space model, document is a point in high dimensional feature space and a classifier separates the feature space into subspaces each of which is labeled with a category.




Collaborative Colleagues:
Atsuhiro Takasu: colleagues
Kenro Aihara: colleagues

Peer to Peer - Readers of this Article have also read: