ACM Home Page
Please provide us with feedback. Feedback
Improving SVM accuracy by training on auxiliary data sources
Full text PdfPdf (264 KB)
Source ACM International Conference Proceeding Series; Vol. 69 archive
Proceedings of the twenty-first international conference on Machine learning table of contents
Banff, Alberta, Canada
Page: 110  
Year of Publication: 2004
ISBN:1-58113-828-5
Authors
Pengcheng Wu  Oregon State University, Corvallis, OR
Thomas G. Dietterich  Oregon State University, Corvallis, OR
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 54,   Citation Count: 14
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1015330.1015436
What is a DOI?

ABSTRACT

The standard model of supervised learning assumes that training and test data are drawn from the same underlying distribution. This paper explores an application in which a second, auxiliary, source of data is available drawn from a different distribution. This auxiliary data is more plentiful, but of significantly lower quality, than the training and test data. In the SVM framework, a training example has two roles: (a) as a data point to constrain the learning process and (b) as a candidate support vector that can form part of the definition of the classifier. The paper considers using the auxiliary data in either (or both) of these roles. This auxiliary data framework is applied to a problem of classifying images of leaves of maple and oak trees using a kernel derived from the shapes of the leaves. Experiments show that when the training data set is very small, training with auxiliary data can produce large improvements in accuracy, even when the auxiliary data is significantly different from the training (and test) data. The paper also introduces techniques for adjusting the kernel scores of the auxiliary data points to make them more comparable to the training data points.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
Clark, P., & Matwin, S. (1993). Using qualitative models to guide inductive learning. Machine Learning: Proceedings of the Tenth International Conference (pp. 49--56). San Francisco, CA: Morgan Kaufmann.
 
5
 
6
Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press.
 
7
Graepel, T., Herbrich, R., Scholkopf, B., Smola, A., Bartlett, P., Robert-Muller, K., Obermayer, K., & Williamson, B. (1999). Classification on proximity data with LP---machines. Proceedings of the Ninth International Conference on Artificial Neural Networks (pp. 304--309).
 
8
Mangasarian, O. (2000). Generalized support vector machines. In A. J. Smola, P. L. Bartlett, B. Schlkopf and D. Schuurmans (Eds.), Advances in large margin classifiers, 135--146. Cambridge, MA.: MIT Press.
 
9
Milios, E., & Petrakis, E. (2000). Shape retrieval based on dynamic programming. IEEE Transactions on Image Processing, 8, 141--146.
 
10

CITED BY  14
Collaborative Colleagues:
Pengcheng Wu: colleagues
Thomas G. Dietterich: colleagues