| Using ambiguity measure feature selection algorithm for support vector machine classifier |
| Full text |
Pdf
(918 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2008 ACM symposium on Applied computing
table of contents
Fortaleza, Ceara, Brazil
SESSION: Data mining
table of contents
Pages 916-920
Year of Publication: 2008
ISBN:978-1-59593-753-7
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 89, Citation Count: 1
|
|
|
ABSTRACT
With the ever-increasing number of documents on the web, digital libraries, news sources, etc., the need of a text classifier that can classify massive amount of data is becoming more critical and difficult. The major problem in text classification is the high dimensionality of feature space. The Support Vector Machine (SVM) classifier is shown to perform consistently better than other text classification algorithms. However, the time taken for training a SVM model is more than other algorithms. We explore the use of the Ambiguity Measure (AM) feature selection method that uses only the most unambiguous keywords to predict the category of a document. Our analysis shows that AM reduces the training time by more than 50% than the scenario when no feature selection is used, while maintaining the accuracy of the text classifier equivalent to or better than using the whole feature set. We empirically show the effectiveness of our approach in outperforming seven different feature selection methods using two standard benchmark datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Chang C. C., Lin C. J., LIBSVM: a library for support vector machines, 2001.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Lang K., Original 20 Newsgroups Dataset. http://people.csai.mit.edu/jrennie/20Newsgroups.
|
| |
8
|
Lewis D., Reuters-21578, http://www.daviddlewis.com/resources/testcollections/reuters21578.
|
| |
9
|
Mengle S., Goharian N., Platt Alana., FACT: Fast Algorithm for Categorizing Text. IEEE 5th International Conference on Intelligence and Security Informatics, 2007. pg 308--315.
|
| |
10
|
|
 |
11
|
|
| |
12
|
Novovicova J., Malik A., Information-theoretic feature selection algorithms for text classification. IEEE International Joint Conference on Neural Networks, IJCNN 2005. Volume: 5, pg 3272--3277.
|
| |
13
|
Wenqian Shang , Houkuan Huang , Haibin Zhu , Yongmin Lin , Youli Qu , Zhihai Wang, A novel feature selection algorithm for text categorization, Expert Systems with Applications: An International Journal, v.33 n.1, p.1-5, July, 2007
[doi> 10.1016/j.eswa.2006.04.001]
|
 |
14
|
Jun Yan , Ning Liu , Benyu Zhang , Shuicheng Yan , Zheng Chen , Qiansheng Cheng , Weiguo Fan , Wei-Ying Ma, OCFS: optimal orthogonal centroid feature selection for text categorization, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076058]
|
| |
15
|
|
 |
16
|
|
| |
17
|
Zheng Z., Srihari R., Optimally Combining Positive and Negative Features for Text Categorization. In Proceedings of the ICML, Workshop on Learning from Imbalanced Datasets II, Washington DC, 2003.
|
|