|
ABSTRACT
Frieze et al. [17] proved that a small sample of rows of a given matrix A contains a low-rank approximation D that minimizes ||A - D||F to within small additive error, and the sampling can be done efficiently using just two passes over the matrix [12]. In this paper, we generalize this result in two ways. First, we prove that the additive error drops exponentially by iterating the sampling in an adaptive manner. Using this result, we give a pass-efficient algorithm for computing low-rank approximation with reduced additive error. Our second result is that using a natural distribution on subsets of rows (called volume sampling), there exists a subset of k rows whose span contains a factor (k + 1) relative approximation and a subset of k + k(k + 1)/ε rows whose span contains a 1+ε relative approximation. The existence of such a small certificate for multiplicative low-rank approximation leads to a PTAS for the following projective clustering problem: Given a set of points P in Rd, and integers k, j, find a set of j subspaces F1, . . ., Fj, each of dimension at most k, that minimize Σp∈Pmini d(p, Fi)2.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
P. Agarwal, S. Har-Peled, K. Varadajan, Geometric approximations via coresets. Manuscript, 2004. http://valis.cs.uiuc.edu/~sariel/papers/04/survey/.
|
 |
4
|
|
 |
5
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
 |
6
|
Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu , Cecilia Procopiuc , Jong Soo Park, Fast algorithms for projected clustering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.61-72, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
7
|
|
| |
8
|
M. Artin. Algebra, Prentice-Hall, 1991.
|
 |
9
|
|
 |
10
|
|
 |
11
|
W. Fernandez de la Vega , Marek Karpinski , Claire Kenyon , Yuval Rabani, Approximation schemes for clustering problems, Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, June 09-11, 2003, San Diego, CA, USA
[doi> 10.1145/780542.780550]
|
| |
12
|
P. Drineas , Alan Frieze , Ravi Kannan , Santosh Vempala , V. Vinay, Clustering in large graphs and matrices, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.291-299, January 17-19, 1999, Baltimore, Maryland, United States
|
| |
13
|
|
| |
14
|
P. Drineas, R. Kannan, M. Maloney, Fast Monte Carlo algorithms for matrices II: computing a low-rank approximation to a matrix. Yale University Technical Report, YALEU/DCS/TR-1270, 2004.
|
| |
15
|
M. Effros, L. J. Schulman, Deterministic clustering with data nets, ECCC TR04--050, 2004.
|
| |
16
|
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, J. Zhang, On graph problems in a semi-streaming model. Proceedings of the 31st ICALP, 2004.
|
 |
17
|
|
| |
18
|
G. Golub, C. Van Loan, Matrix Computations. John Hopkins University Press, third edition, 1996.
|
| |
19
|
S. A. Goreinov, E. E. Tyrtyshnikov, The maximal-volume concept in approximation by low-rank matrices Contemporary Mathematics, Vol. 280 (2001), 47--51.
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
M. Henzinger, P. Raghavan, S. Rajagopalan, Computing on data streams. Technical Note 1998--011, Digital Systems Research Center, Palo Alto, CA, May 1998.
|
 |
24
|
|
| |
25
|
|
| |
26
|
J. Matoušek, On approximate geometric k-clustering. Discrete and Computational Geometry, 61--84, 2000.
|
| |
27
|
N. Megiddo, A. Tamir, On the complexity of locating linear facilities in the plane. Operations Research Letters, 1 (1982), 194--197.
|
 |
28
|
|
 |
29
|
|
| |
30
|
L. Rademacher, S. Vempala, G. Wang, Matrix approximation and projective clustering via iterative sampling. MIT-LCS-TR-983, 2005.
|
CITED BY 9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sanjiv Kumar , Mehryar Mohri , Ameet Talwalkar, On sampling-based approximate spectral decomposition, Proceedings of the 26th Annual International Conference on Machine Learning, p.553-560, June 14-18, 2009, Montreal, Quebec, Canada
|
|