ACM Home Page
Please provide us with feedback. Feedback
trNon-greedy active learning for text categorization using convex ansductive experimental design
Full text PdfPdf (332 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Singapore, Singapore
SESSION: Text classification table of contents
Pages 635-642  
Year of Publication: 2008
ISBN:978-1-60558-164-4
Authors
Kai Yu  NEC Laboratories America, Cupertino, USA
Shenghuo Zhu  NEC Laboratories America, Cupertino, USA
Wei Xu  NEC Laboratories America, Cupertino, USA
Yihong Gong  NEC Laboratories America, Cupertino, USA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 196,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390334.1390442
What is a DOI?

ABSTRACT

In this paper we propose a non-greedy active learning method for text categorization using least-squares support vector machines (LSSVM). Our work is based on transductive experimental design (TED), an active learning formulation that effectively explores the information of unlabeled data. Despite its appealing properties, the optimization problem is however NP-hard and thus--like most of other active learning methods--a greedy sequential strategy to select one data example after another was suggested to find a suboptimum. In this paper we formulate the problem into a continuous optimization problem and prove its convexity, meaning that a set of data examples can be selected with a guarantee of global optimum. We also develop an iterative algorithm to efficiently solve the optimization problem, which turns out to be very easy-to-implement. Our text categorization experiments on two text corpora empirically demonstrated that the new active learning algorithm outperforms the sequential greedy algorithm, and is promising for active text categorization applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. C. Atkinson and A. N. Donev. Optimum experiment designs. Oxford Statistical Science Series. Oxford University Press, 1992.
 
2
O. Chapelle. Active learning for Parzen window classifier. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pages 49--56, 2005.
 
3
D. Cohn and Z. Ghahramani. Active learning with statistical models. Journal of Arti¯cial Intelligence Research, 4:129--145, 1996.
 
4
D. Donoho. For most large underdetermined systems of linear equations, the minimal l1-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6), 2006.
 
5
6
7
8
 
9
 
10
 
11
 
12
A. Schein and L. Ungar. Optimality for active learning of logistic regression classi¯ers. Technical Report Technical Report MS-CIS-04-07, The University of Pennsylvania, Department of Computer and Information Science, 2004.
 
13
 
14
 
15
R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B, 58(1), 1996.
 
16
17
18
 
19
20

Collaborative Colleagues:
Kai Yu: colleagues
Shenghuo Zhu: colleagues
Wei Xu: colleagues
Yihong Gong: colleagues