ACM Home Page
Please provide us with feedback. Feedback
Cross domain distribution adaptation via kernel mapping
Full text MovMov (14:31),  PdfPdf (522 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 1027-1036  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Erheng Zhong  Sun Yat-Sen University, Guangzhou, China
Wei Fan  IBM T.J.Watson Research Center, New York, NY, USA
Jing Peng  Montclair State University, Montclair, NJ, USA
Kun Zhang  Xavier University of Louisiana, New Orleans, LA, USA
Jiangtao Ren  Sun Yat-Sen University, Guangzhou, China
Deepak Turaga  IBM T.J.Watson Research Center, New York, NY, USA
Olivier Verscheure  IBM T.J.Watson Research Center, New York, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 43,   Downloads (12 Months): 172,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557130
What is a DOI?

ABSTRACT

When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of target-domain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10% higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Amini, F. Laviolette, and N. Usunier. A transductive bound for the voted classifier with an application to semi-supervised learning. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21. 2009.
 
2
A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. http://www.ics.uci.edu/mlearn/MLRepository.html.
 
3
 
4
5
 
6
J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman. Learning bounds for domain adaptation. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. 2008.
 
7
F. Cucker and S. Smale. Best choices for regularization parameters in learning theory: On the bias-variance problem. Foundations of Computational Mathematics, 2(4):413--428, 2002.
8
9
 
10
I. Davidson and W. Fan. When efficient model averaging out-performs boosting and bagging. In Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18-22, 2006, Proceedings, pages 478--486. Springer, 2006.
 
11
W. Fan and I. Davidson. On sample selection bias and its efficient correction via model averaging and unlabeled examples. In Proceedings of the Seventh SIAM International Conference on Data Mining, SDM 2007, Minneapolis, Minnesota, USA, 2007. SIAM.
12
 
13
S. Y. Huang and C. R. Hwang. Kernel fisher's discriminant analysis in gaussian reproducing kernel hilbert space. Technical report, Institute of Statistical Science, Academia Sinica, Taiwan, 2005.
 
14
S. J. Pan, J. T. Kwok, and Q. Yang. Transfer learning via dimensionality reduction. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008, pages 677--682. AAAI Press, 2008.
 
15
S. J. Pan and Q. Yang. A survey on transfer learning. Technical Report HKUST-CS08-08, Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China, November 2008.
 
16
J. Ren, X. Shi, W. Fan, and P. S. Yu. Type-independent correction of sample selection bias via structural discovery and re-balancing. In Proceedings of the Eighth SIAM International Conference on Data Mining, SDM 2008, pages 565--576, Atlanta, Georgia, USA, 2008. SIAM.
 
17
 
18
 
19
 
20
 
21
22
 
23
 
24

Collaborative Colleagues:
Erheng Zhong: colleagues
Wei Fan: colleagues
Jing Peng: colleagues
Kun Zhang: colleagues
Jiangtao Ren: colleagues
Deepak Turaga: colleagues
Olivier Verscheure: colleagues