ACM Home Page
Please provide us with feedback. Feedback
Constructing informative priors using transfer learning
Full text PdfPdf (225 KB)
Source ACM International Conference Proceeding Series; Vol. 148 archive
Proceedings of the 23rd international conference on Machine learning table of contents
Pittsburgh, Pennsylvania
Pages: 713 - 720  
Year of Publication: 2006
ISBN:1-59593-383-2
Authors
Rajat Raina  Stanford University, CA
Andrew Y. Ng  Stanford University, CA
Daphne Koller  Stanford University, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 121,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143844.1143934
What is a DOI?

ABSTRACT

Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can try to achieve this goal by using an informative prior over the parameters, one that encodes useful domain knowledge. Focusing on logistic regression, we present an algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task. This prior relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent. The algorithm uses other "similar" learning problems to estimate the covariance of pairs of individual parameters. We then use a semidefinite program to combine these estimates and learn a good prior for the current learning task. We apply our methods to binary text classification, and demonstrate a 20 to 40% test error reduction over a commonly used prior.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817--1853.
 
2
 
3
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. COLT.
 
4
 
5
Chung, F. (1997). Spectral graph theory. Regional Conference Series in Mathematics, American Mathematical Society, 92, 1--212.
 
6
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. In The Annals of Statistics, vol. 7, 1--26.
 
7
Lang, K. (1995). Newsweeder: learning to filter net-news. ICML.
8
9
 
10
Ng, A. Y., Jordan, M., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. NIPS.
 
11
Nigam, K., Lafferty, J., & McCallum, A. (1999). Using maximum entropy for text classification. IJ-CAI Workshop on Machine Learning for Information Filtering.
 
12
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? NIPS.
13

CITED BY  10

Collaborative Colleagues:
Rajat Raina: colleagues
Andrew Y. Ng: colleagues
Daphne Koller: colleagues