|
ABSTRACT
In many real world applications, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data from a related but different domain. Traditional machine learning is not able to cope well with learning across different domains. In this paper, we address this problem for a text-mining task, where the labeled data are under one distribution in one domain known as in-domain data, while the unlabeled data are under a related but different domain known as out-of-domain data. Our general goal is to learn from the in-domain and apply the learned knowledge to out-of-domain. We propose a co-clustering based classification (CoCC) algorithm to tackle this problem. Co-clustering is used as a bridge to propagate the class structure and knowledge from the in-domain to the out-of-domain. We present theoretical and empirical analysis to show that our algorithm is able to produce high quality classification results, even when the distributions between the two data are different. The experimental results show that our algorithm greatly improves the classification performance over the traditional learning algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
Bernhard E. Boser , Isabelle M. Guyon , Vladimir N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, p.144-152, July 27-29, 1992, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/130385.130401]
|
| |
4
|
|
| |
5
|
D. Cohn, R. Caruana, and A. McCallum. Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Cornell University, 2003.
|
| |
6
|
|
| |
7
|
H. Daumé III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101--126, 2006.
|
 |
8
|
|
 |
9
|
|
| |
10
|
J. Gao, P.-N. Tan, and H. Cheng. Semi-supervised clustering with partial background information. In Proceedings of the Sixth SIAM International Conference on Data Mining, 2006.
|
| |
11
|
N. Grira, M. Crucianu, and N. Boujemaa. Unsupervised and semi-supervised clustering: a brief survey, 2005. In A Review of Machine Learning Techniques for Processing Multimedia Content, Report of the MUSCLE Eurepean Network of Excellence (6th Framework Programme).
|
 |
12
|
|
| |
13
|
T. Joachims. SGTlight. http://sgt.joachims.org/.
|
| |
14
|
T. Joachims. SVMlight. http://svmlight.joachims.org/.
|
| |
15
|
|
| |
16
|
T. Joachims. Transductive learning via spectral graph partitioning. In Proceedings of Twentieth International Conference on Machine Learning, 2003.
|
| |
17
|
G. Karypis. Cluto - software for clustering high-dimensional datasets. http://glaros.dtc.umn.edu/gkhome/views/cluto.
|
| |
18
|
K. Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.
|
| |
19
|
D. D. Lewis. Reuters-21578 test collection. http://www.daviddlewis.com/.
|
| |
20
|
|
| |
21
|
A. K. McCallum. Simulated/real/aviation/auto usenet data. http://www.cs.umass.edu/~mccallum/code-data.html.
|
| |
22
|
|
| |
23
|
|
| |
24
|
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
|
| |
25
|
S. Swarup and S. R. Ray. Cross-domain knowledge transfer using structured representations. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, 2006.
|
| |
26
|
|
| |
27
|
|
| |
28
|
X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin-Madison, 2006.
|
CITED BY 4
|
|
Xiao Ling , Wenyuan Dai , Gui-Rong Xue , Qiang Yang , Yong Yu, Spectral domain-transfer learning, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Ping Luo , Fuzhen Zhuang , Hui Xiong , Yuhong Xiong , Qing He, Transfer learning from multiple source domains via consensus regularization, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Jing Gao , Wei Fan , Jing Jiang , Jiawei Han, Knowledge transfer via multiple model local structure mapping, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|