ACM Home Page
Please provide us with feedback. Feedback
Dimension induced clustering
Full text PdfPdf (1.11 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining table of contents
Chicago, Illinois, USA
SESSION: Research track paper table of contents
Pages: 51 - 60  
Year of Publication: 2005
ISBN:1-59593-135-X
Authors
Aristides Gionis  University of Helsinki, Finland
Alexander Hinneburg  Martin-Luther-University Halle, Germany
Spiros Papadimitriou  Carnegie Mellon University, Pittsburgh, PA
Panayiotis Tsaparas  University of Helsinki, Finland
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 86,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081870.1081880
What is a DOI?

ABSTRACT

It is commonly assumed that high-dimensional datasets contain points most of which are located in low-dimensional manifolds. Detection of low-dimensional clusters is an extremely useful task for performing operations such as clustering and classification, however, it is a challenging computational problem. In this paper we study the problem of finding subsets of points with low intrinsic dimensionality. Our main contribution is to extend the definition of fractal correlation dimension, which measures average volume growth rate, in order to estimate the intrinsic dimensionality of the data in local neighborhoods. We provide a careful analysis of several key examples in order to demonstrate the properties of our measure. Based on our proposed measure, we introduce a novel approach to discover clusters with low dimensionality. The resulting algorithms extend previous density based measures, which have been successfully used for clustering. We demonstrate the effectiveness of our algorithms for discovering low-dimensional m-flats embedded in high dimensional spaces, and for detecting low-rank sub-matrices.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
6
7
8
 
9
 
10
 
11
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusteris in large spatial databases with noise. In Proc. KDD, 1996.
12
 
13
 
14
 
15
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
 
16
A. Hinneburg and D. Keim. An efficient approach to clustering in large multimedia databases with noise. In Proc. KDD, 1998.
17
 
18
R. Krauthgamer and J. R. Lee. The black-box complexity of nearest neighbor search. In Proc. ICALP, 2004.
 
19
 
20
S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. LOCI: Fast outlier detection using the local correlation integral. In Proc. ICDE, 2003.
21
 
22
S. N. Rasband. Chaotic Dynamics of Nonlinear Systems. Wiley-Interscience, 1990.
23


Collaborative Colleagues:
Aristides Gionis: colleagues
Alexander Hinneburg: colleagues
Spiros Papadimitriou: colleagues
Panayiotis Tsaparas: colleagues