ACM Home Page
Please provide us with feedback. Feedback
Reducing the human overhead in text categorization
Full text PdfPdf (745 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Philadelphia, PA, USA
POSTER SESSION: Research track posters table of contents
Pages: 598 - 603  
Year of Publication: 2006
ISBN:1-59593-339-5
Authors
Arnd Christian König  Microsoft Research, Redmond, WA
Eric Brill  Microsoft Research, Redmond, WA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 118,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1150402.1150474
What is a DOI?

ABSTRACT

Many applications in text processing require significant human effort for either labeling large document collections (when learning statistical models) or extrapolating rules from them (when using knowledge engineering). In this work, we describe away to reduce this effort, while retaining the methods' accuracy, by constructing a hybrid classifier that utilizes human reasoning over automatically discovered text patterns to complement machine learning. Using a standard sentiment-classification dataset and real customer feedback data, we demonstrate that the resulting technique results in significant reduction of the human effort required to obtain a given classification accuracy. Moreover, the hybrid text classifier also results in a significant boost in accuracy over machine-learning based classifiers when a comparable amount of labeled data is used.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Polarity dataset v2.0. http://www.cs.cornell.edu/people/pabo/movie-review-data/.
 
2
 
3
 
4
P. Beineke, T. Hastie, and S. Vaithyanathan. The Sentimental Factor: Improving Review Classification via Human-Provided Information. In Proceedings of the 42nd ACL Conference, 2004.
5
 
6
 
7
J. Kärkkäinen and P. Sanders. Simple Linear Work Suffix Array Construction. In Proceedings of 13th International Conference on Automata, Languages and Programming, 2003.
 
8
B. Lui, X. Li, W. S. Lee, and P. S. Yu. Text Classification by Labeling Words. In Proceedings of the 19th National Conference on Artificial Intelligence, 2004.
 
9
10
 
11
 
12
B. Pang and L. Lee. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the 42nd ACL Conference, 2004.
 
13
 
14
 
15
H. Raghavan, O. Madani, and R. Jones. InterActive Feature Selection. In Proceedings of IJCAI-05, pages 841--846, 2005.
 
16
17
 
18
 
19
V. Vapnik. Statistical Learning Theory. Whiley, 2000.
20
 
21


Collaborative Colleagues:
Arnd Christian König: colleagues
Eric Brill: colleagues