| Dimension induced clustering |
| Full text |
Pdf
(1.11 MB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
table of contents
Chicago, Illinois, USA
SESSION: Research track paper
table of contents
Pages: 51 - 60
Year of Publication: 2005
ISBN:1-59593-135-X
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 86, Citation Count: 4
|
|
|
ABSTRACT
It is commonly assumed that high-dimensional datasets contain points most of which are located in low-dimensional manifolds. Detection of low-dimensional clusters is an extremely useful task for performing operations such as clustering and classification, however, it is a challenging computational problem. In this paper we study the problem of finding subsets of points with low intrinsic dimensionality. Our main contribution is to extend the definition of fractal correlation dimension, which measures average volume growth rate, in order to estimate the intrinsic dimensionality of the data in local neighborhoods. We provide a careful analysis of several key examples in order to demonstrate the properties of our measure. Based on our proposed measure, we introduce a novel approach to discover clusters with low dimensionality. The resulting algorithms extend previous density based measures, which have been successfully used for clustering. We demonstrate the effectiveness of our algorithms for discovering low-dimensional m-flats embedded in high dimensional spaces, and for detecting low-rank sub-matrices.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu , Cecilia Procopiuc , Jong Soo Park, Fast algorithms for projected clustering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.61-72, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
3
|
|
 |
4
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
 |
5
|
Mihael Ankerst , Markus M. Breunig , Hans-Peter Kriegel , Jörg Sander, OPTICS: ordering points to identify the clustering structure, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.49-60, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusteris in large spatial databases with noise. In Proc. KDD, 1996.
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
|
| |
16
|
A. Hinneburg and D. Keim. An efficient approach to clustering in large multimedia databases with noise. In Proc. KDD, 1998.
|
 |
17
|
|
| |
18
|
R. Krauthgamer and J. R. Lee. The black-box complexity of nearest neighbor search. In Proc. ICALP, 2004.
|
| |
19
|
|
| |
20
|
S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. LOCI: Fast outlier detection using the local correlation integral. In Proc. ICDE, 2003.
|
 |
21
|
|
| |
22
|
S. N. Rasband. Chaotic Dynamics of Nonlinear Systems. Wiley-Interscience, 1990.
|
 |
23
|
|
|