|
ABSTRACT
Many machine learning algorithms can be formulated as a generalized eigenvalue problem. One major limitation of such formulation is that the generalized eigenvalue problem is computationally expensive to solve especially for large-scale problems. In this paper, we show that under a mild condition, a class of generalized eigenvalue problems in machine learning can be formulated as a least squares problem. This class of problems include classical techniques such as Canonical Correlation Analysis (CCA), Partial Least Squares (PLS), and Linear Discriminant Analysis (LDA), as well as Hypergraph Spectral Learning (HSL). As a result, various regularization techniques can be readily incorporated into the formulation to improve model sparsity and generalization ability. In addition, the least squares formulation leads to efficient and scalable implementations based on the iterative conjugate gradient type algorithms. We report experimental results that confirm the established equivalence relationship. Results also demonstrate the efficiency and effectiveness of the equivalent least squares formulations on large-scale problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
d'Aspremont, A., Ghaoui, L., Jordan, M., & Lanckriet, G. (2004). A direct formulation for sparse PCA using semidefinite programming. Neural Information Processing Systems (pp. 41--48).
|
| |
5
|
Donoho, D. (2006). For most large underdetermined systems of linear equations, the minimal 11-norm near-solution approximates the sparsest near-solution. Communications on Pure and Applied Mathematics, 59, 907--934.
|
| |
6
|
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407.
|
| |
7
|
Elisseeff, A., & Weston, J. (2001). A kernel method for multi-labelled classification. Neural Information Processing Systems (pp. 681--687).
|
| |
8
|
Friedman, J., Hastie, T., Höfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 302--332.
|
| |
9
|
Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. Baltimore, MD: Johns Hopkins Press.
|
| |
10
|
Hale, E., Yin, W., & Zhang, Y. (2008). Fixed-point continuation for l1-minimization: Methodology and convergence. SIAM Journal on Optimization, 19, 1107--1130.
|
| |
11
|
Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York, NY: Springer.
|
| |
12
|
Hotelling, H. (1936). Relations between two sets of variables. Biometrika, 28, 312--377.
|
| |
13
|
|
| |
14
|
Kazawa, H., Izumitani, T., Taira, H., & Maeda, E. (2005). Maximal margin labeling for multi-topic text categorization. Neural Information Processing Systems (pp. 649--656).
|
 |
15
|
|
| |
16
|
Rosipal, R., & Kräämer, N. (2006). Overview and recent advances in partial least squares. Subspace, Latent Structure and Feature Selection Techniques, Lecture Notes in Computer Science (pp. 34--51).
|
| |
17
|
Saad, Y. (1992). Numerical methods for large eigenvalue problems. New York, NY: Halsted Press.
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B, 58, 267--288.
|
| |
23
|
|
 |
24
|
|
|