|
ABSTRACT
We consider the problem of selecting the "best" subset of exactly k columns from an m x n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn2, m2n}) time and returns as output an m x k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciously-chosen probability distribution that depends on information in the top-k right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m x k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the "best" rank-k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that [EQUATION] with probability at least 0.7. This spectral norm bound improves upon the best previously-existing result (of Gu and Eisenstat [21]) for the spectral norm version of this Column Subset Selection Problem. We also prove that [EQUATION] with the same probability. This Frobenius norm bound is only a factor of √k log k worse than the best previously existing existential result and is roughly O(√k!) better than the best previous algorithmic result (both of Deshpande et al. [11]) for the Frobenius norm version of this Column Subset Selection Problem.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Ben-Hur and I. Guyon. Detecting stable clusters using principal component analysis. Methods Mol Biol, 224:159--182, 2003.
|
 |
2
|
|
 |
3
|
|
| |
4
|
C. Boutsidis, M. W. Mahoney, and P. Drineas. An Improved Approximation Algorithm for the Column Subset Selection Problem. Manuscript, 2008.
|
| |
5
|
T. F. Chan. Rank revealing QR factorizations. Linear Algebra Appl, 88/89:67--82, 1987.
|
| |
6
|
|
| |
7
|
T. F. Chan and P. C. Hansen. Low-rank revealing QR factorizations. Linear Algebra Appl, 1:33--44, 1994.
|
| |
8
|
|
| |
9
|
|
| |
10
|
A. Civril and M. Magdon-Ismail. Finding Maximum Volume Sub-matrices of a Matrix. RPI Comp Sci Dept TR 07-08, 2007.
|
 |
11
|
Amit Deshpande , Luis Rademacher , Santosh Vempala , Grant Wang, Matrix approximation and projective clustering via volume sampling, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, p.1117-1126, January 22-26, 2006, Miami, Florida
[doi> 10.1145/1109557.1109681]
|
| |
12
|
A. Deshpande and S. Vempala. Adaptive sampling and fast low-rank matrix approximation. In APPROX-RANDOM, 2006.
|
 |
13
|
|
| |
14
|
P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Subspace sampling and relative-error matrix approximation: Column-based methods. In APPROX-RANDOM, 2006.
|
| |
15
|
|
| |
16
|
L. V. Foster. Rank and null space calculations using matrix decomposition without column interchanges. Linear Algebra Appl, 74:47--71, 1986.
|
| |
17
|
L. V. Foster and Xinrong Liu. Comparison of rank revealing algorithms applied to matrices with well defined numerical ranks. Manuscript, 2006.
|
| |
18
|
|
| |
19
|
G. H. Golub. Numerical methods for solving linear least squares problems. Numer Math, 7:206--216, 1965.
|
| |
20
|
G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.
|
| |
21
|
|
| |
22
|
Y. P. Hong and C. T. Pan. Rank-revealing QR factorizations and the singular value decomposition. Math Comp, 58:213--232, 1992.
|
| |
23
|
W. J. Krzanowski. Selection of variables to preserve multivariate data structure, using principal components. Applied Statistics, 36(1):22--33, 1987.
|
| |
24
|
|
 |
25
|
|
| |
26
|
K. Z. Mao. Identifying critical variables of principal components for unsupervised feature selection. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 35(2):339--344, 2005.
|
| |
27
|
P. G. Martinsson, V. Rokhlin, and M. Tygert. A randomized algorithm for the approximation of matrices. Yale Comp Sci Dept TR 1361, 2006.
|
| |
28
|
C. T. Pan. On the existence and computation of rank-revealing LU factorizations. Linear Algebra Appl, 316:199--222, 2000.
|
| |
29
|
C. T. Pan and P. T. P. Tang. Bounds on singular values revealed by QR factorizations. BIT Numerical Mathematics, 39:740--756, 1999.
|
| |
30
|
P. Paschou, E. Ziv, E. G. Burchard, S. Choudhry, W. R. Cintron, M. W. Mahoney, and P. Drineas. PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations. PLoS Genetics, 9(3), 2007.
|
 |
31
|
|
| |
32
|
G. W. Stewart. Four algorithms for the efficient computation of truncated QR approximations to a sparse matrix. Num Math, 83:313--323, 1999.
|
| |
33
|
|
| |
34
|
J. Sun, Y. Xie, H. Zhang, and C. Faloutsos. Less is more: Compact matrix decomposition for large sparse graphs. In SDM, 2007.
|
| |
35
|
F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert. A fast randomized algorithm for the approximation of matrices. Yale Comp Sci Dept TR 1380, 2007.
|
| |
36
|
|
 |
37
|
|
|