ACM Home Page
Please provide us with feedback. Feedback
Scaling up text classification for large file systems
Full text PdfPdf (265 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 239-246  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
George Forman  Hewlett-Packard Labs, Palo Alto, CA, USA
Shyamsundar Rajaram  Hewlett-Packard Labs, Palo Alto, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 282,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401923
What is a DOI?

ABSTRACT

We combine the speed and scalability of information retrieval with the generally superior classification accuracy offered by machine learning, yielding a two-phase text classifier that can scale to very large document corpora. We investigate the effect of different methods of formulating the query from the training set, as well as varying the query size. In empirical tests on the Reuters RCV1 corpus of 806,000 documents, we find runtime was easily reduced by a factor of 27x, with a somewhat surprising gain in F-measure compared with traditional text classification.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
 
6
7
 
8
 
9
 
10
11
 
12
Viola, P. and Jones, M. J. 2002. Robust real-time object detection. International Journal of Computer Vision.
 
13


Collaborative Colleagues:
George Forman: colleagues
Shyamsundar Rajaram: colleagues