|
ABSTRACT
Standard machine learning techniques typically require ample training data in the form of labeled instances. In many situations it may be too tedious or costly to obtain sufficient labeled data for adequate classifier performance. However, in text classification, humans can easily guess the relevance of features, that is, words that are indicative of a topic, thereby enabling the classifier to focus its feature weights more appropriately in the absence of sufficient labeled data. We will describe an algorithm for tandem learning that begins with a couple of labeled instances, and then at each iteration recommends features and instances for a human to label. Tandem learning using an "oracle" results in much better performance than learning on only features or only instances. We find that humans can emulate the oracle to an extent that results in performance (accuracy) comparable to that of the oracle. Our unique experimental design helps factor out system error from human error, leading to a better understanding of when and why interactive feature selection works.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan. Topic detection and tracking. Kluwer, 2002.
|
 |
2
|
|
 |
3
|
Ron Bekkerman , Ran El-Yaniv , Naftali Tishby , Yoad Winter, On feature distributional clustering for text categorization, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.146-153, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383976]
|
| |
4
|
J. Brank, M. Grobelnik, N. Milic-Frayling, and D. Mladenic. Feature selection using linear support vector machines. Technical report, Microsoft Research, 2002.
|
| |
5
|
J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 1960.
|
| |
6
|
|
 |
7
|
|
 |
8
|
Aynur Dayanik , David D. Lewis , David Madigan , Vladimir Menkov , Alexander Genkin, Constructing informative prior distributions from domain knowledge in text classification, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148255]
|
| |
9
|
Shantanu Godbole , Abhay Harpale , Sunita Sarawagi , Soumen Chakrabarti, Document classification through interactive supervision of document and term labels, Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, p.185-196, September 20-24, 2004, Pisa, Italy
|
 |
10
|
|
| |
11
|
|
| |
12
|
G. Landis and G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33:159--174, 1977.
|
| |
13
|
K. Lang. Newsweeder: Learning to filter netnews. In ICML 95, pages 331--339, 1995.
|
| |
14
|
D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In ICML 94, pages 148--156, 1994.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
T. G. Rose, M. Stevenson, and M. Whitehead. The reuters corpus vol. 1 - from yesterday's news to tomorrow's language resources. In Proceedings of International Conference on Language Resources and Evaluation, 2002.
|
 |
20
|
|
| |
21
|
|
| |
22
|
Scholkopf and Smola. Learning with kernels. MIT Press, Cambridge, MA, 2002.
|
 |
23
|
|
| |
24
|
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In International Conference on Intelligence Analysis, 2005.
|
| |
25
|
|
| |
26
|
|
 |
27
|
|
CITED BY 3
|
|
|
|
|
Vikas Sindhwani , Prem Melville , Richard D. Lawrence, Uncertainty sampling and transductive experimental design for active dual supervision, Proceedings of the 26th Annual International Conference on Machine Learning, p.953-960, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|