ACM Home Page
Please provide us with feedback. Feedback
BNS feature scaling: an improved representation over tf-idf for svm text classification
Full text PdfPdf (332 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: IR/KM: machine learning table of contents
Pages 263-270  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Author
George Forman  Hewlett-Packard Labs, Palo Alto, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 25,   Downloads (12 Months): 155,   Citation Count: 0
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458119
What is a DOI?

ABSTRACT

In the realm of machine learning for text classification, TF-IDF is the most widely used representation for real-valued feature vectors. However, IDF is oblivious to the training class labels and naturally scales some features inappropriately. We replace IDF with Bi-Normal Separation (BNS), which has been previously found to be excellent at ranking words for feature selection filtering. Empirical evaluation on a benchmark of 237 binary text classification tasks shows substantially better accuracy and F-measure for a Support Vector Machine (SVM) by using BNS scaling. A wide variety of other feature representations were later tested and found inferior, as well as binary features with no scaling. Moreover, BNS scaling yielded better performance without feature selection, obviating the need for feature selection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Forman, G. BNS Scaling: A Complement to Feature Selection for SVM Text Classification. Hewlett-Packard Labs Tech Report HPL-2006-19, 2006.
3
 
4
 
5
Gabrilovich, E., and Markovitch, S. Feature Generation for Text Categorization Using World Knowledge. In Proc. of the 19th Intl. Joint Conference for Artificial Intelligence (IJCAI, Edinburgh), 2005.
6
 
7
8
 
9
 
10
 
11
McCallum, A. and Nigam, K. A Comparison of Event Models for Naive Bayes Text Classification. Workshop on Learning for Text Categorization, In the 15th National Conf. on Artificial Intelligence (AAAI), 1998.
12
 
13
14
 
15
 
16
 
17