|
ABSTRACT
Traditional machine learning makes a basic assumption: the training and test data should be under the same distribution. However, in many cases, this identical-distribution assumption does not hold. The assumption might be violated when a task from one new domain comes, while there are only labeled data from a similar old domain. Labeling the new data can be costly and it would also be a waste to throw away all the old data. In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (Freund & Schapire, 1997). TrAdaBoost allows users to utilize a small amount of newly labeled data to leverage the old data to construct a high-quality classification model for the new data. We show that this method can allow us to learn an accurate model using only a tiny amount of new data and a large amount of old data, even when the new data are not sufficient to train a model alone. We show that TrAdaBoost allows knowledge to be effectively transferred from the old data to the new. The effectiveness of our algorithm is analyzed theoretically and empirically to show that our iterative algorithm can converge well to an accurate model.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. Proceedings of the Sixteenth Annual Conference on Learning Theory.
|
| |
2
|
Bickel, S., & Scheffer, T. (2007). Dirichlet-enhanced spam filtering based on biased samples. In Advances in neural information processing systems 19.
|
 |
3
|
Bernhard E. Boser , Isabelle M. Guyon , Vladimir N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, p.144-152, July 27-29, 1992, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/130385.130401]
|
| |
4
|
|
| |
5
|
DauméIII, H., & Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26, 101--126.
|
| |
6
|
Dudík, M., Schapire, R., & Phillips, S. (2006). Correcting sample selection bias in maximum entropy density estimation. In Advances in neural information processing systems 18.
|
| |
7
|
|
| |
8
|
Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153--161.
|
| |
9
|
Huang, J., Smola, A., Gretton, A., Borgwardt, K. M., & Schöölkopf, B. (2007). Correcting sample selection bias by unlabeled data. In Advances in neural information processing systems 19.
|
| |
10
|
|
| |
11
|
Joachims, T. (2002). Learning to classify text using support vector machines. Dissertation, Kluwer.
|
| |
12
|
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79--86.
|
 |
13
|
|
| |
14
|
Rosenstein, M. T., Marx, Z., Kaelbling, L. P., & Dietterich, T. G. (2005). To transfer or not to transfer. Proceedings of NIPS 2005 Workshop on Inductive Transfer: 10 Years Later.
|
| |
15
|
|
| |
16
|
|
| |
17
|
Schmidhuber, J. (1994). On learning how to learn learning strategies (Technical Report FKI-198-94). Fakultat fur Informatik.
|
| |
18
|
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90, 227--244.
|
| |
19
|
Thrun, S., & Mitchell, T. M. (1995). Learning one more thing. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence.
|
 |
20
|
|
 |
21
|
|
CITED BY 9
|
|
|
|
|
Xiao Ling , Wenyuan Dai , Gui-Rong Xue , Qiang Yang , Yong Yu, Spectral domain-transfer learning, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Ping Luo , Fuzhen Zhuang , Hui Xiong , Yuhong Xiong , Qing He, Transfer learning from multiple source domains via consensus regularization, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Jing Gao , Wei Fan , Jing Jiang , Jiawei Han, Knowledge transfer via multiple model local structure mapping, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
Sihong Xie , Wei Fan , Jing Peng , Olivier Verscheure , Jiangtao Ren, Latent space domain transfer between high dimensional overlapping distributions, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
Erheng Zhong , Wei Fan , Jing Peng , Kun Zhang , Jiangtao Ren , Deepak Turaga , Olivier Verscheure, Cross domain distribution adaptation via kernel mapping, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
|
|