ACM Home Page
Please provide us with feedback. Feedback
Optimal reverse prediction: a unified perspective on supervised, unsupervised and semi-supervised learning
Full text PdfPdf (625 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 1137-1144  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Linli Xu  University of Alberta, Edmonton, AB, Canada
Martha White  University of Alberta, Edmonton, AB, Canada
Dale Schuurmans  University of Alberta, Edmonton, AB, Canada
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 47,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553519
What is a DOI?

ABSTRACT

Training principles for unsupervised learning are often derived from motivations that appear to be independent of supervised learning. In this paper we present a simple unification of several supervised and unsupervised training principles through the concept of optimal reverse prediction: predict the inputs from the target labels, optimizing both over model parameters and any missing labels. In particular, we show how supervised least squares, principal components analysis, k-means clustering and normalized graph-cut can all be expressed as instances of the same training principle. Natural forms of semi-supervised regression and classification are then automatically derived, yielding semi-supervised learning algorithms for regression and classification that, surprisingly, are novel and refine the state of the art. These algorithms can all be combined with standard regularizers and made non-linear via kernels.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Balcan, M.-F., & Blum, A. (2005). A PAC-style model for learning from labeled and unlabeled data. Conf. Comput. Learn. Theory (COLT) (pp. 111--126).
 
2
 
3
Ben-David, S., Lu, T., & Páál, D. (2008). Does unlabeled data provably help? Conf. Comput. Learn. Theory (COLT) (pp. 33--44).
 
4
5
 
6
Chapelle, O., Vapnik, V., & Weston, J. (1999). Transductive inference for estimating values of functions. Adv. Neural Info. Proc. 12 (NIPS) (pp. 421--427).
 
7
Chen, H.-R., & Peng, J. (2008). 0-1 semidefinite programming for graph-cut clustering. In CRM Proceedings and Lecture Notes of the Amer. Math. Soc.
 
8
Corduneanu, A., & Jaakkola., T. (2006). Data dependent regularization. In O. Chapelle, B. Scholkööpf and A. Zien (Eds.), Semi-supervised learning, 163--182. MIT Press.
 
9
Cortes, C., & Mohri, M. (2006). On transductive regression. Adv. Neural Info. Proc. Sys. 19 (NIPS) (pp. 305--312).
 
10
De Bie, T., & Cristianini, N. (2003). Convex methods for transduction. Adv. Neural Info. Proc. Sys. 16 (NIPS) (pp. 73--80).
11
12
 
13
 
14
Joachims, T. (2003). Transductive learning via spectral graph partitioning. Inter. Conf. Mach. Learn. (ICML) (pp. 290--297).
 
15
Jong, J.-C., & Kotz, S. (1999). On a relation between principal components and regression analysis. The American Statistician, 53, 349--351.
 
16
Katayama, T. (2005). Subspace methods for system identification. Springer.
 
17
 
18
 
19
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Berkeley Symp. on Math. Stats. and Prob (pp. 281--297).
 
20
 
21
 
22
Schölkopf, B., & Smola, A. (2002). Learning with kernels. MIT Press.
 
23
24
 
25
 
26
Xing, E., & Jordan, M. (2003). On semidefinite relaxation for normalized k-cut and connections to spectral clustering. TR CSD-03-1265, Berkeley.
 
27
Xu, L., Neufeld, J., Larson, B., & Schuurmans, D. (2004). Maximum margin clustering. Adv. Neural Info. Proc. Sys. 17 (NIPS) (pp. 1537--1544).
 
28
Zhou, D., & Schöölkopf, B. (2006). Discrete regularization. In O. Chapelle, B. Scholköpf and A. Zien (Eds.), Semi-supervised learning, 221--232. MIT Press.
 
29
Zhu, X. (2005). Semi-supervised learning literature survey. TR 1530, U. Wisconsin, CS Dept.

Collaborative Colleagues:
Linli Xu: colleagues
Martha White: colleagues
Dale Schuurmans: colleagues