ACM Home Page
Please provide us with feedback. Feedback
Improved smoothed analysis of the k-means method
Full text PdfPdf (513 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms table of contents
New York, New York
Pages 461-470  
Year of Publication: 2009
Authors
Bodo Manthey  Saarland University
Heiko Röglin  Boston University
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 77,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii [3] aimed at closing this gap, and they proved a bound of poly(nk, σ−1) on the smoothed running-time of the k-means method, where n is the number of data points and σ is the standard deviation of the Gaussian perturbation. This bound, though better than the worst-case bound, is still much larger than the running-time observed in practice.

We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in nk and σ−1. Second, we prove an upper bound of kkd·poly(n, σ−1), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k, dO(√logn/log logn).

Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
David Arthur. Smoothed analysis of the k-means method. Manuscript, 2008.
2
 
3
4
 
5
Pavel Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, USA, 2002.
 
6
 
7
 
8
Mary Inaba, Naoki Katoh, and Hiroshi Imai. Variance-based k-clustering algorithms by Voronoi diagrams and randomization. IEICE Transactions on Information and Systems, E83-D(6):1199--1206, 2000.
 
9
 
10
Stuart P. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129--137, 1982.
 
11
Jiří Matoušek. On approximate geometric k-clustering. Discrete and Computational Geometry, 24(1):61--84, 2000.
12
 
13
Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms and heuristics: Progress and open questions. In Luis M. Pardo, Allan Pinkus, Endre Süli, and Michael J. Todd, editors, Foundations of Computational Mathematics, Santander 2005, pages 274--342. Cambridge University Press, 2006.


Collaborative Colleagues:
Bodo Manthey: colleagues
Heiko Röglin: colleagues