ACM Home Page
Please provide us with feedback. Feedback
Computing Clusters of Correlation Connected objects
Full text PdfPdf (645 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: clustering table of contents
Pages: 455 - 466  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Christian Böhm  University of Munich, Munich, Germany
Karin Kailing  University of Munich, Munich, Germany
Peer Kröger  University of Munich, Munich, Germany
Arthur Zimek  University of Munich, Munich, Germany
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 114,   Citation Count: 10
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007620
What is a DOI?

ABSTRACT

The detection of correlations between different features in a set of feature vectors is a very important data mining task because correlation indicates a dependency between the features or some association of cause and effect between them. This association can be arbitrarily complex, i.e. one or more features might be dependent from a combination of several other features. Well-known methods like the principal components analysis (PCA) can perfectly find correlations which are global, linear, not hidden in a set of noise vectors, and uniform, i.e. the same type of correlation is exhibited in all feature vectors. In many applications such as medical diagnosis, molecular biology, time sequences, or electronic commerce, however, correlations are not global since the dependency between features can be different in different subgroups of the set. In this paper, we propose a method called 4C (Computing Correlation Connected Clusters) to identify local subgroups of the data objects sharing a uniform but arbitrarily complex correlation. Our algorithm is based on a combination of PCA and density-based clustering (DBSCAN). Our method has a determinate result and is robust against noise. A broad comparative evaluation demonstrates the superior performance of 4C over competing methods such as DBSCAN, CLIQUE and ORCLUS.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
 
6
 
7
 
8
9
 
10
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise". In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), Portland, OR, 1996.
 
11
S. Goil, H. Nagesh, and A. Choudhary. "MAFIA: Efficiant and Scalable Subspace Clustering for Very Large Data Sets". Tech. Report No. CPDC-TR-9906-010, Center for Parallel and Distributed Computing, Dept. of Electrical and Computer Engineering, Northwestern University, 1999.
 
12
A. Hinneburg and D. A. Keim. "An Efficient Approach to Clustering in Large Multimedia Databases with Noise". In Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), New York, NY, 1998.
 
13
B. Liebl, U. Nennstiel-Ratzel, R. von Kries, R. Fingerhut, B. Olgemöller, A. Zapf, and A. A. Roscher. "Very High Compliance in an Expanded MS-MS-Based Newborn Screening Program Despite Written Parental Consent". Preventive Medicine, 34(2):127--131, 2002.
 
14
 
15
E. Parros Machado de Sousa, C. Traina, A. Traina, and C. Faloutsos. "How to Use Fractal Dimension to Find Correlations between Attributes". In Proc. KDD-Workshop on Fractals and Self-similarity in Data Mining: Issues and Approaches, 2002.
 
16
17
 
18
Saccharomyces Genome Database (SGD). http://www.yeastgenome.org/. (visited: Oktober/November 2003).
 
19
 
20
S. Tavazoie, J. D. Hughes, M. J. Camphell, R. J. Cho, and C. G. M. "Systematic Determination of Genetic Network Architecture". Nature Genetics, 22:281--285, 1999.
21
 
22
 
23

CITED BY  10
Collaborative Colleagues:
Christian Böhm: colleagues
Karin Kailing: colleagues
Peer Kröger: colleagues
Arthur Zimek: colleagues