ACM Home Page
Please provide us with feedback. Feedback
OPTICS: ordering points to identify the clustering structure
Full text PdfPdf (1.77 MB)
Source International Conference on Management of Data archive
Proceedings of the 1999 ACM SIGMOD international conference on Management of data table of contents
Philadelphia, Pennsylvania, United States
Pages: 49 - 60  
Year of Publication: 1999
ISBN:1-58113-084-8
Also published in ...
Authors
Mihael Ankerst  Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany
Markus M. Breunig  Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany
Hans-Peter Kriegel  Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany
Jörg Sander  Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 239,   Citation Count: 119
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/304182.304187
What is a DOI?

ABSTRACT

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

AGG+ 98
 
AKK 96
Ankerst M., Keim D. A., Kriegel H.-P.: "'Circle Segments': A Technique for Visually Exploring Large Multidimensional Data Sets", Proc. Visualization'96, Hot Topic Session, San Francisco, CA, 1996.
 
BKK 96
BKSS 90
 
CPZ 97
 
EKSX 96
Ester M., Kriegel H.-P., Sander J., Xu X.: "A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.
 
EKS+ 98
 
EKX 95
 
GM 85
Grossman A., Morlet J.: "Decomposition oj'functions into wavelets of constant shapes and related tr~msforms". Mathematics and Physics: Lectures on Recent Restdts, World Scientific, Singapore, 1985.
GRS 98
 
HK 98
Hinneburg A., Keim D.: "An Efficient Approa~:h to Clustering in Large Multimedia Databases with Noise"~, Proc. 4th Int. Conf. on Knowledge Discovery & Data Milling, New York City, NY, 1998.
 
HT 93
Hattori K., Torii Y.: "Effective algorithms for Jhe nearest neighbor method in the clustering problem", Patt~.,rn Recognition, 1993, Vol. 26, No. 5, pp. 741-746.
 
Hua 97
Huang Z.: "A Fast Clustering Algorithm to C,'uster Very Large Categorical Data Sets in Data Mining", 1)roc. SIG- OD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97-07, UBC, Dept. of CS, 1997.
 
JD 88
Kei 96a
Kei 96b
 
KN 96
 
KR 90
Kaufman L., Rousseeuw E J.: "Finding GrouFs in Data: An Introduction to Cluster Analysis", John Wiley & Sons, 1990.
 
Mac 67
MacQueen, J.: "Some Methods for Classification and Analysis of Multivariate Observations", 5th Berkeley Synap. Math. Statist. Prob., Vol. 1, pp. 281-297.
 
NH 94
 
PTVF 92
Press W. H.,Teukolsky S. A., Vetterling W. T., Flannery B. E: "Numerical Recipes in C", 2nd ed., Cambridl,ye University Press, 1992.
 
Ric 83
 
Sch 96
Schikuta E.: "'Grid clustering: An efficient hierarchical clustering method for very large data sets". Proc. 13th Int. Conf. on Pattern Recognition, Vol 2, 1996, pp. 101-105.
 
SE 97
 
SCZ 98
 
Sib 73
Sibson R.: "SLINK: an optimally efficient alggrithm for the single-link cluster method".The Comp. Journal, Vol. }'~ 6, No. 1, 1973, pp. 30-34.
ZRL 96

CITED BY  120

Collaborative Colleagues:
Mihael Ankerst: colleagues
Markus M. Breunig: colleagues
Hans-Peter Kriegel: colleagues
Jörg Sander: colleagues