ACM Home Page
Please provide us with feedback. Feedback
Fast approximate spectral clustering
Full text MovMov (16:12),  PdfPdf (467 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 907-916  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Donghui Yan  University of California, Berkeley, Berkeley, CA, USA
Ling Huang  Intel, Berkeley, CA, USA
Michael I. Jordan  University of California, Berkeley, Berkeley, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 85,   Downloads (12 Months): 208,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557118
What is a DOI?

ABSTRACT

Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n3) in general, with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nystrom method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
A. Asuncion and D. Newman. UCI Machine Learning Repository, Department of Information and Computer Science. http://www.ics.uci.edu/ mlearn/MLRepository.html, 2007.
 
4
5
 
6
 
7
8
 
9
 
10
P. Drineas and M. W. Mahoney. On the Nystrom method for approximating a Gram matrix for improved kernel-based learning. In Proceedings of COLT, pages 323--337, 2005.
 
11
 
12
 
13
R. M. Gray and D. L. Neuhoff. Quantization. IEEE Transactions of Information Theory, 44(6):2325--2383, 1998.
 
14
 
15
 
16
J. A. Hartigan and M. A. Wong. A k-means clustering algorithm. Applied Statistics, 28(1):100--108, 1979.
17
 
18
L. Huang, D. Yan, M. I. Jordan, and N. Taft. Spectral clustering with perturbed data. In Advances in Neural Information Processing Systems (NIPS), December 2008.
19
20
 
21
 
22
 
23
 
24
 
25
 
26
M. Meila and J. Shi. Learning segmentation with random walk. In Advances in Neural Information Processing Systems (NIPS), 2001.
 
27
 
28
A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (NIPS), 2002.
 
29
 
30
 
31
 
32
 
33
G. Stewart. Introduction to Matrix Computation. Academic Press, 1973.
 
34
U. von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. Annals of Statistics, 36(2):555--586, 2008.
 
35
C. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems, 2001.
 
36
D. Yan, L. Huang, and M. I. Jordan. Fast approximate spectral clustering. Technical report, Department of Statistics, UC Berkeley, 2009.
 
37
 
38
P. L. Zador. Asymptotic quantization error of continuous signals and the quantization dimension. IEEE Transactions of Information Theory, 28:139--148, 1982.

Collaborative Colleagues:
Donghui Yan: colleagues
Ling Huang: colleagues
Michael I. Jordan: colleagues