| Reducing the human overhead in text categorization |
| Full text |
Pdf
(745 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Philadelphia, PA, USA
POSTER SESSION: Research track posters
table of contents
Pages: 598 - 603
Year of Publication: 2006
ISBN:1-59593-339-5
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 118, Citation Count: 3
|
|
|
ABSTRACT
Many applications in text processing require significant human effort for either labeling large document collections (when learning statistical models) or extrapolating rules from them (when using knowledge engineering). In this work, we describe away to reduce this effort, while retaining the methods' accuracy, by constructing a hybrid classifier that utilizes human reasoning over automatically discovered text patterns to complement machine learning. Using a standard sentiment-classification dataset and real customer feedback data, we demonstrate that the resulting technique results in significant reduction of the human effort required to obtain a given classification accuracy. Moreover, the hybrid text classifier also results in a significant boost in accuracy over machine-learning based classifiers when a comparable amount of labeled data is used.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Polarity dataset v2.0. http://www.cs.cornell.edu/people/pabo/movie-review-data/.
|
| |
2
|
|
| |
3
|
|
| |
4
|
P. Beineke, T. Hastie, and S. Vaithyanathan. The Sentimental Factor: Improving Review Classification via Human-Provided Information. In Proceedings of the 42nd ACL Conference, 2004.
|
 |
5
|
Susan Dumais , John Platt , David Heckerman , Mehran Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the seventh international conference on Information and knowledge management, p.148-155, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288651]
|
| |
6
|
|
| |
7
|
J. Kärkkäinen and P. Sanders. Simple Linear Work Suffix Array Construction. In Proceedings of 13th International Conference on Automata, Languages and Programming, 2003.
|
| |
8
|
B. Lui, X. Li, W. S. Lee, and P. S. Yu. Text Classification by Labeling Words. In Proceedings of the 19th National Conference on Artificial Intelligence, 2004.
|
| |
9
|
|
 |
10
|
|
| |
11
|
Kamal Nigam , Andrew McCallum , Sebastian Thrun , Tom Mitchell, Learning to classify text from labeled and unlabeled documents, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.792-799, July 1998, Madison, Wisconsin, United States
|
| |
12
|
B. Pang and L. Lee. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the 42nd ACL Conference, 2004.
|
| |
13
|
|
| |
14
|
|
| |
15
|
H. Raghavan, O. Madani, and R. Jones. InterActive Feature Selection. In Proceedings of IJCAI-05, pages 841--846, 2005.
|
| |
16
|
|
 |
17
|
H. S. Seung , M. Opper , H. Sompolinsky, Query by committee, Proceedings of the fifth annual workshop on Computational learning theory, p.287-294, July 27-29, 1992, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/130385.130417]
|
| |
18
|
|
| |
19
|
V. Vapnik. Statistical Learning Theory. Whiley, 2000.
|
 |
20
|
|
| |
21
|
|
CITED BY 3
|
Xin Jin , Ying Li , Teresa Mah , Jie Tong, Sensitive webpage classification for content advertising, Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, p.28-33, August 12-12, 2007, San Jose, California
|
|
|
|
|
|