|
ABSTRACT
Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
AGG+ 98
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
| |
AKK 96
|
Ankerst M., Keim D. A., Kriegel H.-P.: "'Circle Segments': A Technique for Visually Exploring Large Multidimensional Data Sets", Proc. Visualization'96, Hot Topic Session, San Francisco, CA, 1996.
|
| |
BKK 96
|
|
 |
BKSS 90
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
CPZ 97
|
|
| |
EKSX 96
|
Ester M., Kriegel H.-P., Sander J., Xu X.: "A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.
|
| |
EKS+ 98
|
|
| |
EKX 95
|
|
| |
GM 85
|
Grossman A., Morlet J.: "Decomposition oj'functions into wavelets of constant shapes and related tr~msforms". Mathematics and Physics: Lectures on Recent Restdts, World Scientific, Singapore, 1985.
|
 |
GRS 98
|
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States
|
| |
HK 98
|
Hinneburg A., Keim D.: "An Efficient Approa~:h to Clustering in Large Multimedia Databases with Noise"~, Proc. 4th Int. Conf. on Knowledge Discovery & Data Milling, New York City, NY, 1998.
|
| |
HT 93
|
Hattori K., Torii Y.: "Effective algorithms for Jhe nearest neighbor method in the clustering problem", Patt~.,rn Recognition, 1993, Vol. 26, No. 5, pp. 741-746.
|
| |
Hua 97
|
Huang Z.: "A Fast Clustering Algorithm to C,'uster Very Large Categorical Data Sets in Data Mining", 1)roc. SIG- OD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97-07, UBC, Dept. of CS, 1997.
|
| |
JD 88
|
|
 |
Kei 96a
|
|
 |
Kei 96b
|
|
| |
KN 96
|
|
| |
KR 90
|
Kaufman L., Rousseeuw E J.: "Finding GrouFs in Data: An Introduction to Cluster Analysis", John Wiley & Sons, 1990.
|
| |
Mac 67
|
MacQueen, J.: "Some Methods for Classification and Analysis of Multivariate Observations", 5th Berkeley Synap. Math. Statist. Prob., Vol. 1, pp. 281-297.
|
| |
NH 94
|
|
| |
PTVF 92
|
Press W. H.,Teukolsky S. A., Vetterling W. T., Flannery B. E: "Numerical Recipes in C", 2nd ed., Cambridl,ye University Press, 1992.
|
| |
Ric 83
|
|
| |
Sch 96
|
Schikuta E.: "'Grid clustering: An efficient hierarchical clustering method for very large data sets". Proc. 13th Int. Conf. on Pattern Recognition, Vol 2, 1996, pp. 101-105.
|
| |
SE 97
|
|
| |
SCZ 98
|
|
| |
Sib 73
|
Sibson R.: "SLINK: an optimally efficient alggrithm for the single-link cluster method".The Comp. Journal, Vol. }'~ 6, No. 1, 1973, pp. 30-34.
|
 |
ZRL 96
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 120
|
|
|
|
|
|
|
|
|
|
|
Christian Böhm , Bernhard Braunmüller , Markus Breunig , Hans-Peter Kriegel, High performance clustering based on the similarity join, Proceedings of the ninth international conference on Information and knowledge management, p.298-305, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
Mihael Ankerst , Christian Elsen , Martin Ester , Hans-Peter Kriegel, Visual classification: an interactive approach to decision tree construction, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.392-396, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hans-Peter Kriegel , Stefan Brecheisen , Peer Kröger , Martin Pfeifle , Matthias Schubert, Using sets of feature vectors for similarity search on voxelized CAD objects, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Akanksha Huang , Z. Huang , B. Prabhakaran , C. R. Ruiz, Jr., Interactive visual method for motion and model reuse, Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia, February 11-14, 2003, Melbourne, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aristides Gionis , Alexander Hinneburg , Spiros Papadimitriou , Panayiotis Tsaparas, Dimension induced clustering, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rong Ge , Martin Ester , Byron J. Gao , Zengjian Hu , Binay Bhattacharya , Boaz Ben-Moshe, Joint cluster analysis of attribute data and relationship data: The connected k-center problem, algorithms and applications, ACM Transactions on Knowledge Discovery from Data (TKDD), v.2 n.2, p.1-35, July 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christian Böhm , Christos Faloutsos , Jia-Yu Pan , Claudia Plant, Robust information-theoretic clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lian Duan , Lida Xu , Feng Guo , Jun Lee , Baopin Yan, A local-density based spatial clustering algorithm with noise, Information Systems, v.32 n.7, p.978-986, November, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Charu C. Aggarwal , Jiawei Han , Jianyong Wang , Philip S. Yu, A framework for projected clustering of high dimensional data streams, Proceedings of the Thirtieth international conference on Very large data bases, p.852-863, August 31-September 03, 2004, Toronto, Canada
|
|
|
Charu C. Aggarwal , Jiawei Han , Jianyong Wang , Philip S. Yu, A framework for clustering evolving data streams, Proceedings of the 29th international conference on Very large data bases, p.81-92, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nancy P. Lin , Chung-I Chang , Nien-yi Jan , Hao-En Chueh , Hung-Jen Chen , Wei-Hua Hao, An axis-shifted crossover-imaged clustering algorithm, WSEAS TRANSACTIONS on SYSTEMS, v.7 n.3, p.175-184, March 2008
|
|
|
Yu Zheng , Quannan Li , Yukun Chen , Xing Xie , Wei-Ying Ma, Understanding mobility based on GPS data, Proceedings of the 10th international conference on Ubiquitous computing, September 21-24, 2008, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guang-Hui Yan , Li-Song Liu , Lin-Na Du , Xia-Xia Yang , Zhi-Cheng Ma , Xiao-Min Zhang, Multifractal-based cluster hierarchy optimisation algorithm, International Journal of Business Intelligence and Data Mining, v.3 n.4, p.353-374, January 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yihong Dong , Shaoka Cao , Ken Chen , Maoshun He , Xiaoying Tai, PFHC: A clustering algorithm based on data partitioning for unevenly distributed datasets, Fuzzy Sets and Systems, v.160 n.13, p.1886-1901, July, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Salvatore Rinzivillo , Dino Pedreschi , Mirco Nanni , Fosca Giannotti , Natalia Andrienko , Gennady Andrienko, Visually driven analysis of movement data by progressive clustering, Information Visualization, v.7 n.3, p.225-239, June 2008
|
|
|
|
|
|
Lin Zhu , Fu-Lai Chung , Shitong Wang, Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, v.39 n.3, p.578-591, June 2009
|
|
|
|
|
|
|
|