ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
An improved approximation algorithm for the column subset selection problem
Full text PdfPdf (424 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms table of contents
New York, New York
Pages: 968-977  
Year of Publication: 2009
Authors
Christos Boutsidis  Rensselaer Polytechnic Institute, Troy, NY
Michael W. Mahoney  Stanford University, Stanford, CA
Petros Drineas  Rensselaer Polytechnic Institute, Troy, NY
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 118,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We consider the problem of selecting the "best" subset of exactly k columns from an m x n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn2, m2n}) time and returns as output an m x k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciously-chosen probability distribution that depends on information in the top-k right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m x k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the "best" rank-k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that

[EQUATION]

with probability at least 0.7. This spectral norm bound improves upon the best previously-existing result (of Gu and Eisenstat [21]) for the spectral norm version of this Column Subset Selection Problem. We also prove that

[EQUATION]

with the same probability. This Frobenius norm bound is only a factor of √k log k worse than the best previously existing existential result and is roughly O(√k!) better than the best previous algorithmic result (both of Deshpande et al. [11]) for the Frobenius norm version of this Column Subset Selection Problem.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Ben-Hur and I. Guyon. Detecting stable clusters using principal component analysis. Methods Mol Biol, 224:159--182, 2003.
2
3
 
4
C. Boutsidis, M. W. Mahoney, and P. Drineas. An Improved Approximation Algorithm for the Column Subset Selection Problem. Manuscript, 2008.
 
5
T. F. Chan. Rank revealing QR factorizations. Linear Algebra Appl, 88/89:67--82, 1987.
 
6
 
7
T. F. Chan and P. C. Hansen. Low-rank revealing QR factorizations. Linear Algebra Appl, 1:33--44, 1994.
 
8
 
9
 
10
A. Civril and M. Magdon-Ismail. Finding Maximum Volume Sub-matrices of a Matrix. RPI Comp Sci Dept TR 07-08, 2007.
11
 
12
A. Deshpande and S. Vempala. Adaptive sampling and fast low-rank matrix approximation. In APPROX-RANDOM, 2006.
13
 
14
P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Subspace sampling and relative-error matrix approximation: Column-based methods. In APPROX-RANDOM, 2006.
 
15
 
16
L. V. Foster. Rank and null space calculations using matrix decomposition without column interchanges. Linear Algebra Appl, 74:47--71, 1986.
 
17
L. V. Foster and Xinrong Liu. Comparison of rank revealing algorithms applied to matrices with well defined numerical ranks. Manuscript, 2006.
 
18
 
19
G. H. Golub. Numerical methods for solving linear least squares problems. Numer Math, 7:206--216, 1965.
 
20
G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.
 
21
 
22
Y. P. Hong and C. T. Pan. Rank-revealing QR factorizations and the singular value decomposition. Math Comp, 58:213--232, 1992.
 
23
W. J. Krzanowski. Selection of variables to preserve multivariate data structure, using principal components. Applied Statistics, 36(1):22--33, 1987.
 
24
25
 
26
K. Z. Mao. Identifying critical variables of principal components for unsupervised feature selection. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 35(2):339--344, 2005.
 
27
P. G. Martinsson, V. Rokhlin, and M. Tygert. A randomized algorithm for the approximation of matrices. Yale Comp Sci Dept TR 1361, 2006.
 
28
C. T. Pan. On the existence and computation of rank-revealing LU factorizations. Linear Algebra Appl, 316:199--222, 2000.
 
29
C. T. Pan and P. T. P. Tang. Bounds on singular values revealed by QR factorizations. BIT Numerical Mathematics, 39:740--756, 1999.
 
30
P. Paschou, E. Ziv, E. G. Burchard, S. Choudhry, W. R. Cintron, M. W. Mahoney, and P. Drineas. PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations. PLoS Genetics, 9(3), 2007.
31
 
32
G. W. Stewart. Four algorithms for the efficient computation of truncated QR approximations to a sparse matrix. Num Math, 83:313--323, 1999.
 
33
 
34
J. Sun, Y. Xie, H. Zhang, and C. Faloutsos. Less is more: Compact matrix decomposition for large sparse graphs. In SDM, 2007.
 
35
F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert. A fast randomized algorithm for the approximation of matrices. Yale Comp Sci Dept TR 1380, 2007.
 
36
37

Collaborative Colleagues:
Christos Boutsidis: colleagues
Michael W. Mahoney: colleagues
Petros Drineas: colleagues