| Optimal reverse prediction: a unified perspective on supervised, unsupervised and semi-supervised learning |
| Full text |
Pdf
(625 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 1137-1144
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 47, Citation Count: 0
|
|
|
ABSTRACT
Training principles for unsupervised learning are often derived from motivations that appear to be independent of supervised learning. In this paper we present a simple unification of several supervised and unsupervised training principles through the concept of optimal reverse prediction: predict the inputs from the target labels, optimizing both over model parameters and any missing labels. In particular, we show how supervised least squares, principal components analysis, k-means clustering and normalized graph-cut can all be expressed as instances of the same training principle. Natural forms of semi-supervised regression and classification are then automatically derived, yielding semi-supervised learning algorithms for regression and classification that, surprisingly, are novel and refine the state of the art. These algorithms can all be combined with standard regularizers and made non-linear via kernels.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Balcan, M.-F., & Blum, A. (2005). A PAC-style model for learning from labeled and unlabeled data. Conf. Comput. Learn. Theory (COLT) (pp. 111--126).
|
| |
2
|
|
| |
3
|
Ben-David, S., Lu, T., & Páál, D. (2008). Does unlabeled data provably help? Conf. Comput. Learn. Theory (COLT) (pp. 33--44).
|
| |
4
|
|
 |
5
|
|
| |
6
|
Chapelle, O., Vapnik, V., & Weston, J. (1999). Transductive inference for estimating values of functions. Adv. Neural Info. Proc. 12 (NIPS) (pp. 421--427).
|
| |
7
|
Chen, H.-R., & Peng, J. (2008). 0-1 semidefinite programming for graph-cut clustering. In CRM Proceedings and Lecture Notes of the Amer. Math. Soc.
|
| |
8
|
Corduneanu, A., & Jaakkola., T. (2006). Data dependent regularization. In O. Chapelle, B. Scholkööpf and A. Zien (Eds.), Semi-supervised learning, 163--182. MIT Press.
|
| |
9
|
Cortes, C., & Mohri, M. (2006). On transductive regression. Adv. Neural Info. Proc. Sys. 19 (NIPS) (pp. 305--312).
|
| |
10
|
De Bie, T., & Cristianini, N. (2003). Convex methods for transduction. Adv. Neural Info. Proc. Sys. 16 (NIPS) (pp. 73--80).
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
Joachims, T. (2003). Transductive learning via spectral graph partitioning. Inter. Conf. Mach. Learn. (ICML) (pp. 290--297).
|
| |
15
|
Jong, J.-C., & Kotz, S. (1999). On a relation between principal components and regression analysis. The American Statistician, 53, 349--351.
|
| |
16
|
Katayama, T. (2005). Subspace methods for system identification. Springer.
|
| |
17
|
|
| |
18
|
|
| |
19
|
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Berkeley Symp. on Math. Stats. and Prob (pp. 281--297).
|
| |
20
|
|
| |
21
|
|
| |
22
|
Schölkopf, B., & Smola, A. (2002). Learning with kernels. MIT Press.
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
Xing, E., & Jordan, M. (2003). On semidefinite relaxation for normalized k-cut and connections to spectral clustering. TR CSD-03-1265, Berkeley.
|
| |
27
|
Xu, L., Neufeld, J., Larson, B., & Schuurmans, D. (2004). Maximum margin clustering. Adv. Neural Info. Proc. Sys. 17 (NIPS) (pp. 1537--1544).
|
| |
28
|
Zhou, D., & Schöölkopf, B. (2006). Discrete regularization. In O. Chapelle, B. Scholköpf and A. Zien (Eds.), Semi-supervised learning, 221--232. MIT Press.
|
| |
29
|
Zhu, X. (2005). Semi-supervised learning literature survey. TR 1530, U. Wisconsin, CS Dept.
|
|