|
ABSTRACT
We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task. Such unlabeled data is significantly easier to obtain than in typical semi-supervised or transfer learning settings, making self-taught learning widely applicable to many practical learning problems. We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. These features form a succinct input representation and significantly improve classification performance. When using an SVM for classification, we further show how a Fisher kernel can be learned for this representation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Blei, D., Ng, A. Y., & Jordan, M. (2002). Latent dirichlet allocation. NIPS.
|
| |
4
|
|
| |
5
|
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. J. Am. Soc. Info. Sci., 41, 391--407.
|
| |
6
|
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Stat., 32, 407--499.
|
| |
7
|
|
| |
8
|
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2007). Efficient sparse coding algorithms. NIPS.
|
 |
14
|
Andrew Y. Ng, Feature selection, L1 vs. L2 regularization, and rotational invariance, Proceedings of the twenty-first international conference on Machine learning, p.78, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015435]
|
| |
15
|
|
| |
16
|
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607--609.
|
| |
17
|
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290.
|
| |
18
|
|
| |
19
|
Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319--2323.
|
| |
20
|
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? NIPS.
|
| |
21
|
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B., 58, 267--288.
|
| |
22
|
Tsuda, K., Kin, T., & Asai, K. (2002). Marginalized kernels for biological sequences. Bioinformatics, 18.
|
| |
23
|
|
CITED BY 18
|
|
Wenyuan Dai , Qiang Yang , Gui-Rong Xue , Yong Yu, Self-taught clustering, Proceedings of the 25th international conference on Machine learning, p.200-207, July 05-09, 2008, Helsinki, Finland
|
|
|
|
|
|
Ping Luo , Fuzhen Zhuang , Hui Xiong , Yuhong Xiong , Qing He, Transfer learning from multiple source domains via consensus regularization, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Wenyuan Dai , Ou Jin , Gui-Rong Xue , Qiang Yang , Yong Yu, EigenTransfer: a unified framework for transfer learning, Proceedings of the 26th Annual International Conference on Machine Learning, p.193-200, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Julien Mairal , Francis Bach , Jean Ponce , Guillermo Sapiro, Online dictionary learning for sparse coding, Proceedings of the 26th Annual International Conference on Machine Learning, p.689-696, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Bin Li , Qiang Yang , Xiangyang Xue, Transfer learning for collaborative filtering via a rating-matrix generative model, Proceedings of the 26th Annual International Conference on Machine Learning, p.617-624, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Honglak Lee , Roger Grosse , Rajesh Ranganath , Andrew Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, Proceedings of the 26th Annual International Conference on Machine Learning, p.609-616, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Rajat Raina , Anand Madhavan , Andrew Y. Ng, Large-scale deep unsupervised learning using graphics processors, Proceedings of the 26th Annual International Conference on Machine Learning, p.873-880, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Nilesh Dalvi , Ravi Kumar , Bo Pang , Raghu Ramakrishnan , Andrew Tomkins , Philip Bohannon , Sathiya Keerthi , Srujana Merugu, A web of concepts, Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 29-July 01, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|