|
ABSTRACT
Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously. In this paper, we consider the problem of learning shared structures from multiple related tasks. We present an improved formulation (iASO) for multi-task learning based on the non-convex alternating structure optimization (ASO) algorithm, in which all tasks are related by a shared feature representation. We convert iASO, a non-convex formulation, into a relaxed convex one, which is, however, not scalable to large data sets due to its complex constraints. We propose an alternating optimization (cASO) algorithm which solves the convex relaxation efficiently, and further show that cASO converges to a global optimum. In addition, we present a theoretical condition, under which cASO can find a globally optimal solution to iASO. Experiments on several benchmark data sets confirm our theoretical analysis.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Yonatan Amit , Michael Fink , Nathan Srebro , Shimon Ullman, Uncovering shared structures in multiclass classification, Proceedings of the 24th international conference on Machine learning, p.17-24, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273499]
|
| |
2
|
Ando, R. K. (2007). BioCreative II gene mention tagging system at IBM Watson. Proc. of the 2nd. BioCreative Challenge Evaluation Workshop.
|
| |
3
|
|
| |
4
|
|
| |
5
|
Argyriou, A., Micchelli, C. A., Pontil, M., & Ying, Y. (2007). A spectral regularization framework for multi-task structure learning. Adv. in Neural Info. Proc. Sys..
|
| |
6
|
|
| |
7
|
Baxter, J. (2000). A model of inductive bias learning. J. Artif. Intell. Res., 12, 149--198.
|
| |
8
|
Bertsekas, D. P. (1999). Nonlinear programming. Athena Scientific.
|
| |
9
|
|
| |
10
|
|
| |
11
|
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
|
| |
12
|
|
| |
13
|
Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. Johns Hopkins University Press.
|
| |
14
|
Heisele, B., Serre, T., Pontil, M., Vetter, T., & Poggio, T. (2001). Categorization by learning and combining object parts. Adv. in Neural Info. Proc. Sys. (pp. 1239--1245).
|
| |
15
|
Jacob, L., Bach, F., & Vert, J.-P. (2008). Clustered multi-task learning: A convex formulation. Adv. in Neural Info. Proc. Sys. (pp. 745--752).
|
 |
16
|
|
| |
17
|
|
| |
18
|
Obozinski, G., Taskar, B., & Jordan, M. I. (2006). Multi-task feature selection. Technical report, Dept. of Statistics, UC Berkeley.
|
| |
19
|
|
| |
20
|
Quattoni, A., Collins, M., & Darrell, T. (2007). Learning visual representations using images with captions. IEEE Conf. on Comp. Vision and Patt. Recog..
|
| |
21
|
Schwaighofer, A., Tresp, V., & Yu, K. (2004). Learning gaussian process kernels via hierarchical bayes. Adv. in Neural Info. Proc. Sys..
|
| |
22
|
Ueda, N., & Saito, K. (2002). Parametric mixture models for multi-labeled text. Adv. in Neural Info. Proc. Sys. (pp. 721--728).
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
| |
26
|
Zhang, J., Ghahramani, Z., & Yang, Y. (2005). Learning multiple related tasks using latent independent component analysis. Adv. in Neural Info. Proc. Sys..
|
|