ACM Home Page
Please provide us with feedback. Feedback
Using ambiguity measure feature selection algorithm for support vector machine classifier
Full text PdfPdf (918 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2008 ACM symposium on Applied computing table of contents
Fortaleza, Ceara, Brazil
SESSION: Data mining table of contents
Pages 916-920  
Year of Publication: 2008
ISBN:978-1-59593-753-7
Authors
Saket S. R. Mengle  Illinois Institute of Technology, Chicago, Illinois
Nazli Goharian  Illinois Institute of Technology, Chicago, Illinois
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 89,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1363686.1363896
What is a DOI?

ABSTRACT

With the ever-increasing number of documents on the web, digital libraries, news sources, etc., the need of a text classifier that can classify massive amount of data is becoming more critical and difficult. The major problem in text classification is the high dimensionality of feature space. The Support Vector Machine (SVM) classifier is shown to perform consistently better than other text classification algorithms. However, the time taken for training a SVM model is more than other algorithms. We explore the use of the Ambiguity Measure (AM) feature selection method that uses only the most unambiguous keywords to predict the category of a document. Our analysis shows that AM reduces the training time by more than 50% than the scenario when no feature selection is used, while maintaining the accuracy of the text classifier equivalent to or better than using the whole feature set. We empirically show the effectiveness of our approach in outperforming seven different feature selection methods using two standard benchmark datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Chang C. C., Lin C. J., LIBSVM: a library for support vector machines, 2001.
 
2
 
3
 
4
 
5
 
6
 
7
Lang K., Original 20 Newsgroups Dataset. http://people.csai.mit.edu/jrennie/20Newsgroups.
 
8
Lewis D., Reuters-21578, http://www.daviddlewis.com/resources/testcollections/reuters21578.
 
9
Mengle S., Goharian N., Platt Alana., FACT: Fast Algorithm for Categorizing Text. IEEE 5th International Conference on Intelligence and Security Informatics, 2007. pg 308--315.
 
10
11
 
12
Novovicova J., Malik A., Information-theoretic feature selection algorithms for text classification. IEEE International Joint Conference on Neural Networks, IJCNN 2005. Volume: 5, pg 3272--3277.
 
13
14
 
15
16
 
17
Zheng Z., Srihari R., Optimally Combining Positive and Negative Features for Text Categorization. In Proceedings of the ICML, Workshop on Learning from Imbalanced Datasets II, Washington DC, 2003.


Collaborative Colleagues:
Saket S. R. Mengle: colleagues
Nazli Goharian: colleagues