ACM Home Page
Please provide us with feedback. Feedback
EDSC: efficient density-based subspace clustering
Full text PdfPdf (637 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: KM: clustering table of contents
Pages 1093-1102  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Ira Assent  RWTH Aachen University, Aachen, Germany
Ralph Krieger  RWTH Aachen University, Aachen, Germany
Emmanuel Müller  RWTH Aachen University, Aachen, Germany
Thomas Seidl  RWTH Aachen University, Aachen, Germany
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 148,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458227
What is a DOI?

ABSTRACT

Subspace clustering mines clusters hidden in subspaces of high-dimensional data sets. Density-based approaches have been shown to successfully mine clusters of arbitrary shape even in the presence of noise in full space clustering. Exhaustive search of all density-based subspace clusters, however, results in infeasible runtimes for large high-dimensional data sets. This is due to the exponential number of possible subspace projections in addition to the high computational cost of density-based clustering.

In this paper, we propose lossless efficient detection of density-based subspace clusters. In our EDSC (efficient density-based subspace clustering) algorithm we reduce the high computational cost of density-based subspace clustering by a complete multistep filter-and-refine algorithm. Our first hypercube filter step avoids exhaustive search of all regions in all subspaces by enclosing potentially density-based clusters in hypercubes. Our second filter step provides additional pruning based on a density monotonicity property. In the final refinement step, the exact unbiased density-based subspace clustering result is detected. As we prove that pruning is lossless in both filter steps, we guarantee completeness of the result.

In thorough experiments on synthetic and real world data sets, we demonstrate substantial efficiency gains. Our lossless EDSC approach outperforms existing density-based subspace clustering algorithms by orders of magnitude.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
6
 
7
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In KDD, pages 226--231, 1996.
 
8
A. Hinneburg and D. Keim. An efficient approach to clustering in large multimedia databases with noise. In KDD, pages 58--65, 1998.
 
9
I. Joliffe. Principal Component Analysis. Springer, New York, 1986.
 
10
K. Kailing, H.-P. Kriegel, and P. Kröger. Density-connected subspace clustering for high-dimensional data. In SDM, pages 246--257, 2004.
 
11
K. Kailing, H.-P. Kriegel, P. Kröger, and S. Wanka. Ranking interesting subspaces for clustering high dimensional data. In PKDD, pages 241--252, 2003.
 
12
 
13
 
14
J. MacQueen. Some methods for classification and analysis of multivariate observations. In Berkeley Symp. Math. stat. & prob., pages 281--297, 1967.
 
15
 
16
H. Nagesh, S. Goil, and A. Choudhary. MAFIA: Efficient and scalable subspace clustering for very large data sets. In TR 9906--010, NWU, 1999.
 
17
D. Newman, S. Hettich, C. Blake, and C. Merz. UCI repository of MLDBs, 1998.
 
18
 
19

Collaborative Colleagues:
Ira Assent: colleagues
Ralph Krieger: colleagues
Emmanuel Müller: colleagues
Thomas Seidl: colleagues