|
ABSTRACT
Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can try to achieve this goal by using an informative prior over the parameters, one that encodes useful domain knowledge. Focusing on logistic regression, we present an algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task. This prior relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent. The algorithm uses other "similar" learning problems to estimate the covariance of pairs of individual parameters. We then use a semidefinite program to combine these estimates and learn a good prior for the current learning task. We apply our methods to binary text classification, and demonstrate a 20 to 40% test error reduction over a commonly used prior.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817--1853.
|
| |
2
|
|
| |
3
|
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. COLT.
|
| |
4
|
|
| |
5
|
Chung, F. (1997). Spectral graph theory. Regional Conference Series in Mathematics, American Mathematical Society, 92, 1--212.
|
| |
6
|
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. In The Annals of Statistics, vol. 7, 1--26.
|
| |
7
|
Lang, K. (1995). Newsweeder: learning to filter net-news. ICML.
|
 |
8
|
|
 |
9
|
|
| |
10
|
Ng, A. Y., Jordan, M., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. NIPS.
|
| |
11
|
Nigam, K., Lafferty, J., & McCallum, A. (1999). Using maximum entropy for text classification. IJ-CAI Workshop on Machine Learning for Information Filtering.
|
| |
12
|
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? NIPS.
|
 |
13
|
|
CITED BY 10
|
|
Wenyuan Dai , Qiang Yang , Gui-Rong Xue , Yong Yu, Self-taught clustering, Proceedings of the 25th international conference on Machine learning, p.200-207, July 05-09, 2008, Helsinki, Finland
|
|
|
|
|
|
Ping Luo , Fuzhen Zhuang , Hui Xiong , Yuhong Xiong , Qing He, Transfer learning from multiple source domains via consensus regularization, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Andrei Z. Broder , Peter Ciccolo , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Search advertising using web relevance feedback, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Wenyuan Dai , Gui-Rong Xue , Qiang Yang , Yong Yu, Transferring naive bayes classifiers for text classification, Proceedings of the 22nd national conference on Artificial intelligence, p.540-545, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
Lilyana Mihalkova , Tuyen Huynh , Raymond J. Mooney, Mapping and revising Markov logic networks for transfer learning, Proceedings of the 22nd national conference on Artificial intelligence, p.608-614, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
Wenyuan Dai , Ou Jin , Gui-Rong Xue , Qiang Yang , Yong Yu, EigenTransfer: a unified framework for transfer learning, Proceedings of the 26th Annual International Conference on Machine Learning, p.193-200, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Ming-Wei Chang , Lev Ratinov , Dan Roth , Vivek Srikumar, Importance of semantic representation: dataless classification, Proceedings of the 23rd national conference on Artificial intelligence, p.830-835, July 13-17, 2008, Chicago, Illinois
|
|
|
|
|