ACM Home Page
Please provide us with feedback. Feedback
Semi-supervised text categorization by active search
Full text PdfPdf (255 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
POSTER SESSION: Poster session 3/knowledge management table of contents
Pages 1517-1518  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Zenglin Xu  The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Rong Jin  Michigan State University, East Lansing, MI, USA
Kaizhu Huang  The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Michael R. Lyu  The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Irwin King  The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 85,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458364
What is a DOI?

ABSTRACT

In automated text categorization, given a small number of labeled documents, it is very challenging, if not impossible, to build a reliable classifier that is able to achieve high classification accuracy. To address this problem, a novel web-assisted text categorization framework is proposed in this paper. Important keywords are first automatically identified from the available labeled documents to form the queries. Search engines are then utilized to retrieve from the Web a multitude of relevant documents, which are then exploited by a semi-supervised framework. To our best knowledge, this work is the first study of this kind. Extensive experimental study shows the encouraging results of the proposed text categorization framework: using Google as the web search engine, the proposed framework is able to reduce the classification error by 30% when compared with the state-of-the-art supervised text categorization method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006.
 
2
 
3
V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.

Collaborative Colleagues:
Zenglin Xu: colleagues
Rong Jin: colleagues
Kaizhu Huang: colleagues
Michael R. Lyu: colleagues
Irwin King: colleagues