ACM Home Page
Please provide us with feedback. Feedback
An interactive algorithm for asking and incorporating feature feedback into support vector machines
Full text PdfPdf (200 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Classification and clustering table of contents
Pages: 79 - 86  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Hema Raghavan  Yahoo! Inc
James Allan  University of Massachusetts
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): ,   Downloads (12 Months): ,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277758
What is a DOI?

ABSTRACT

Standard machine learning techniques typically require ample training data in the form of labeled instances. In many situations it may be too tedious or costly to obtain sufficient labeled data for adequate classifier performance. However, in text classification, humans can easily guess the relevance of features, that is, words that are indicative of a topic, thereby enabling the classifier to focus its feature weights more appropriately in the absence of sufficient labeled data. We will describe an algorithm for tandem learning that begins with a couple of labeled instances, and then at each iteration recommends features and instances for a human to label. Tandem learning using an "oracle" results in much better performance than learning on only features or only instances. We find that humans can emulate the oracle to an extent that results in performance (accuracy) comparable to that of the oracle. Our unique experimental design helps factor out system error from human error, leading to a better understanding of when and why interactive feature selection works.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan. Topic detection and tracking. Kluwer, 2002.
2
3
 
4
J. Brank, M. Grobelnik, N. Milic-Frayling, and D. Mladenic. Feature selection using linear support vector machines. Technical report, Microsoft Research, 2002.
 
5
J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 1960.
 
6
7
8
 
9
10
 
11
 
12
G. Landis and G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33:159--174, 1977.
 
13
K. Lang. Newsweeder: Learning to filter netnews. In ICML 95, pages 331--339, 1995.
 
14
D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In ICML 94, pages 148--156, 1994.
 
15
16
 
17
 
18
 
19
T. G. Rose, M. Stevenson, and M. Whitehead. The reuters corpus vol. 1 - from yesterday's news to tomorrow's language resources. In Proceedings of International Conference on Language Resources and Evaluation, 2002.
20
 
21
 
22
Scholkopf and Smola. Learning with kernels. MIT Press, Cambridge, MA, 2002.
23
 
24
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In International Conference on Intelligence Analysis, 2005.
 
25
 
26
27


Collaborative Colleagues:
Hema Raghavan: colleagues
James Allan: colleagues