|
ABSTRACT
The standard model of supervised learning assumes that training and test data are drawn from the same underlying distribution. This paper explores an application in which a second, auxiliary, source of data is available drawn from a different distribution. This auxiliary data is more plentiful, but of significantly lower quality, than the training and test data. In the SVM framework, a training example has two roles: (a) as a data point to constrain the learning process and (b) as a candidate support vector that can form part of the definition of the classifier. The paper considers using the auxiliary data in either (or both) of these roles. This auxiliary data framework is applied to a problem of classifying images of leaves of maple and oak trees using a kernel derived from the shapes of the leaves. Experiments show that when the training data set is very small, training with auxiliary data can produce large improvements in accuracy, even when the auxiliary data is significantly different from the training (and test) data. The paper also introduces techniques for adjusting the kernel scores of the auxiliary data points to make them more comparable to the training data points.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
Clark, P., & Matwin, S. (1993). Using qualitative models to guide inductive learning. Machine Learning: Proceedings of the Tenth International Conference (pp. 49--56). San Francisco, CA: Morgan Kaufmann.
|
| |
5
|
|
| |
6
|
Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press.
|
| |
7
|
Graepel, T., Herbrich, R., Scholkopf, B., Smola, A., Bartlett, P., Robert-Muller, K., Obermayer, K., & Williamson, B. (1999). Classification on proximity data with LP---machines. Proceedings of the Ninth International Conference on Artificial Neural Networks (pp. 304--309).
|
| |
8
|
Mangasarian, O. (2000). Generalized support vector machines. In A. J. Smola, P. L. Bartlett, B. Schlkopf and D. Schuurmans (Eds.), Advances in large margin classifiers, 135--146. Cambridge, MA.: MIT Press.
|
| |
9
|
Milios, E., & Petrakis, E. (2000). Shape retrieval based on dynamic programming. IEEE Transactions on Image Processing, 8, 141--146.
|
| |
10
|
|
CITED BY 14
|
|
|
|
|
Wenyuan Dai , Qiang Yang , Gui-Rong Xue , Yong Yu, Boosting for transfer learning, Proceedings of the 24th international conference on Machine learning, p.193-200, June 20-24, 2007, Corvalis, Oregon
|
|
|
|
|
|
|
|
|
Xiao Ling , Wenyuan Dai , Gui-Rong Xue , Qiang Yang , Yong Yu, Spectral domain-transfer learning, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Steffen Bickel , Jasmina Bogojeska , Thomas Lengauer , Tobias Scheffer, Multi-task learning for HIV therapy screening, Proceedings of the 25th international conference on Machine learning, p.56-63, July 05-09, 2008, Helsinki, Finland
|
|
|
Wenyuan Dai , Qiang Yang , Gui-Rong Xue , Yong Yu, Self-taught clustering, Proceedings of the 25th international conference on Machine learning, p.200-207, July 05-09, 2008, Helsinki, Finland
|
|
|
Lixin Duan , Ivor W. Tsang , Dong Xu , Tat-Seng Chua, Domain adaptation from multiple sources via auxiliary classifiers, Proceedings of the 26th Annual International Conference on Machine Learning, p.289-296, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
Wenyuan Dai , Gui-Rong Xue , Qiang Yang , Yong Yu, Transferring naive bayes classifiers for text classification, Proceedings of the 22nd national conference on Artificial intelligence, p.540-545, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
Wenyuan Dai , Ou Jin , Gui-Rong Xue , Qiang Yang , Yong Yu, EigenTransfer: a unified framework for transfer learning, Proceedings of the 26th Annual International Conference on Machine Learning, p.193-200, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|