|
ABSTRACT
We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multi-task learning, matrix classification, and matrix completion. The standard semidefinite programming formulation for this problem is computationally expensive. In addition, due to the non-smooth nature of the trace norm, the optimal first-order black-box method for solving such class of problems converges as O(1/√k), where k is the iteration counter. In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k). We further propose an accelerated gradient algorithm, which achieves the optimal convergence rate of O(1/k2) for smooth problems. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Abernethy, J., Bach, F., Evgeniou, T., & Vert, J.-P. (2006). Low-rank matrix factorization with attributes (Technical Report N24/06/MM). Ecole des Mines de Paris.
|
| |
2
|
Abernethy, J., Bach, F., Evgeniou, T., & Vert, J.-P. (2009). A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res., 10, 803--826.
|
 |
3
|
Yonatan Amit , Michael Fink , Nathan Srebro , Shimon Ullman, Uncovering shared structures in multiclass classification, Proceedings of the 24th international conference on Machine learning, p.17-24, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273499]
|
| |
4
|
|
| |
5
|
|
| |
6
|
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183--202.
|
| |
7
|
Bertsekas, D. P. (1999). Nonlinear programming. Athena Scientific. 2nd edition.
|
| |
8
|
Cai, J.-F., Candés, E. J., & Shen, Z. (2008). A singular value thresholding algorithm for matrix completion (Technical Report 08-77). UCLA Computational and Applied Math.
|
| |
9
|
Candés, E. J., & Recht, B. (2008). Exact matrix completion via convex optimization (Technical Report 08-76). UCLA Computational and Applied Math.
|
| |
10
|
Fazel, M., Hindi, H., & Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference, 4734--4739.
|
| |
11
|
Lu, Z., Monteiro, R. D. C., & Yuan, M. (2008). Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Submitted to Mathematical Programming.
|
| |
12
|
Ma, S., Goldfarb, D., & Chen, L. (2008). Fixed point and Bregman iterative methods for matrix rank minimization (Technical Report 08-78). UCLA Computational and Applied Math.
|
| |
13
|
Nemirovsky, A. S., & Yudin, D. B. (1983). Problem complexity and method efficiency in optimization. John Wiley & Sons Ltd.
|
| |
14
|
Nesterov, Y. (1983). A method for solving a convex programming problem with convergence rate O(1/k2). Soviet Math. Dokl., 27, 372--376.
|
| |
15
|
Nesterov, Y. (2003). Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers.
|
| |
16
|
|
| |
17
|
Nesterov, Y. (2007). Gradient methods for minimizing composite objective function (Technical Report 2007/76). CORE, Université catholique de Louvain.
|
| |
18
|
Obozinski, G., Taskar, B., & Jordan, M. I. (2009). Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing. In press.
|
| |
19
|
Recht, B., Fazel, M., & Parrilo, P. (2008a). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. Submitted to SIAM Review.
|
| |
20
|
Recht, B., Xu, W., & Hassibi, B. (2008b). Necessary and sufficient condtions for success of the nuclear norm heuristic for rank minimization. In Proceedings of the 47th IEEE Conference on Decision and Control, 3065--3070.
|
 |
21
|
|
| |
22
|
Srebro, N., Rennie, J. D. M., & Jaakkola, T. S. (2005). Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems, 1329--1336.
|
| |
23
|
Toh, K.-C., & Yun, S. (2009). An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Preprint, Department of Mathematics, National University of Singapore, March 2009.
|
 |
24
|
|
| |
25
|
Tseng, P. (2008). On accelerated proximal gradient methods for convex-concave optimization. Submitted to SIAM Journal on Optimization.
|
| |
26
|
Weimer, M., Karatzoglou, A., Le, Q., & Smola, A. rank (2008a). COFIrank - maximum margin matrix factorization for collaborative ranking. In Advances in Neural Information Processing Systems, 1593--1600.
|
| |
27
|
|
| |
28
|
Yuan, M., Ekici, A., Lu, Z., & Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B, 69, 329--346.
|
|