|
ABSTRACT
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Rakesh Agrawal , Heikki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
| |
2
|
|
| |
3
|
P. Arabic and L. J. Hubert. An overview of combinatorial data analyis, in P. Arabic, L. Hubert, and G. D. Sorts, editors, Clustering and Classification, pages 5-63. World Scientific Pub., New Jersey, 1996.
|
| |
4
|
Arbor Software Corporation. Application Manager User's Guide, Essbase Version 4.0 edition.
|
 |
5
|
|
 |
6
|
Stefan Berchtold , Christian Böhm , Daniel A. Keim , Hans-Peter Kriegel, A cost model for nearest neighbor search in high-dimensional data space, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.78-86, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263671]
|
| |
7
|
M. Berger and I. Regoutsos. An algorithm for point clustering and grid generation. IEEE Transactions on Systems, Man and Cybernetics, 21(5):1278-86, 1991.
|
 |
8
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
9
|
|
| |
10
|
R. Chhikara and D. Register. A numerical classification method for partitioning of a large multidimensional mixed data set. Technometrics, 21:531-537, 1979.
|
| |
11
|
R. O. Duds and P. E. Hart. Pattern Classification and Scene Analysis. john Wiley and Sons, 1973.
|
| |
12
|
R. J. Earle. Method and apparatus for storing and retrieving multi-dimensional data in computer memory. U.S. Patent No. 5359724, October 1994.
|
| |
13
|
M. Ester, H.-P. Kriegel, 2. Sander, and X. Xu. A densitybased algorithm for discovering clusters in large spatial databases with noise, in Proc. of the ~nd Int'l Cort/erenee on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, August 1996.
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
J. Friedman. Optimizing a noisy function of many variables with application to data mining. In UW/MSR Summer Research Institute in Data Mining, July 1997.
|
| |
19
|
|
 |
20
|
Dimitrios Gunopulos , Heikki Mannila , Roni Khardon , Hannu Toivonen, Data mining, hypergraph transversals, and machine learning (extended abstract), Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.209-216, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263684]
|
 |
21
|
Ching-Tien Ho , Rakesh Agrawal , Nimrod Megiddo , Ramakrishnan Srikant, Range queries in OLAP data cubes, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.73-88, May 11-15, 1997, Tucson, Arizona, United States
|
| |
22
|
S. J. Hong. MINI: A heuristic algorithm for two-level logic minimization. In R. Newton, editor, Selected Papers on Logic Synthesis/or Integrated Circuit Design. IEEE Press, 1987.
|
| |
23
|
Internationl Business Machines. IBM Intelligent Miner User's Guide, Version 1 Release 1, SH12-6213-00 edition, July 1996.
|
| |
24
|
|
| |
25
|
L. Kaufman and P. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, 1990.
|
| |
26
|
|
| |
27
|
L. Lovasz. On the ratio of the optimal integral and fractional covers. Discrete Mathematics, 13:383-390, 1975.
|
 |
28
|
|
| |
29
|
W. Masek. Some NP-comptete set covering problems. M.S. Thesis, MIT, 1978.
|
| |
30
|
|
| |
31
|
R. S. Michalski and It. E. Stepp. Learning from observation: Conceptual clustering. In It. S. Michalski, 3. G. Carbonell, nnd T. M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach, volume I, pages 331-363. Morgan Kaufmann, 1983.
|
 |
32
|
|
| |
33
|
|
 |
34
|
|
| |
35
|
|
| |
36
|
P. Schroeter and J. Bigun. Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement. Pattern Recognition, 25(5):695-709, May 1995.
|
| |
37
|
|
| |
38
|
A. Shoshani. Personal communication. 1997.
|
| |
39
|
P. Sneath and R. Sokal. Numerical Tazonomy. Freeman, 1973.
|
 |
40
|
|
| |
41
|
|
| |
42
|
S. Wharton. A generalized histogram clustering for multidimensional image data. Pattern Recognition, 16(2):193-199, 1983.
|
| |
43
|
|
 |
44
|
|
 |
45
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 236
|
|
Venkatesh Ganti , Johannes Gehrke , Raghu Ramakrishnan, A framework for measuring changes in data characteristics, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.126-137, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christian Böhm , Bernhard Braunmüller , Markus Breunig , Hans-Peter Kriegel, High performance clustering based on the similarity join, Proceedings of the ninth international conference on Information and knowledge management, p.298-305, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
Venkatesh Ganti , Johannes Gehrke , Raghu Ramakrishnan, CACTUS—clustering categorical data using summaries, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.73-83, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
YeongSeog Kim , W. Nick Street , Filippo Menczer, Feature selection in unsupervised learning via evolutionary search, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.365-369, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
Chun-Hung Cheng , Ada Waichee Fu , Yi Zhang, Entropy-based subspace clustering for mining numerical data, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.84-93, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
David McWherter , Mitchell Peabody , Ali C. Shokoufandeh , William Regli, Database techniques for archival of solid models, Proceedings of the sixth ACM symposium on Solid modeling and applications, p.78-87, May 2001, Ann Arbor, Michigan, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Wang , Jiong Yang , Philip S. Yu, Efficient mining of weighted association rules (WAR), Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.270-274, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Paul B. Chou , Edna Grossman , Dimitrios Gunopulos , Pasumarti Kamesam, Identifying prospective customers, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.447-456, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bing Liu , Yiyuan Xia , Philip S. Yu, Clustering through decision tree construction, Proceedings of the ninth international conference on Information and knowledge management, p.20-29, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mohammed J. Zaki , Markus Peters , Ira Assent , Thomas Seidl, CLICKS: an effective algorithm for mining subspace clusters in categorical datasets, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aristides Gionis , Alexander Hinneburg , Spiros Papadimitriou , Panayiotis Tsaparas, Dimension induced clustering, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
Daniel B. Neill , Andrew W. Moore , Maheshkumar Sabhnani , Kenny Daniel, Detection of emerging space-time clusters, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amit Deshpande , Luis Rademacher , Santosh Vempala , Grant Wang, Matrix approximation and projective clustering via volume sampling, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, p.1117-1126, January 22-26, 2006, Miami, Florida
|
|
|
Aoying Zhou , Weining Qian , Hailei Qian , Long Zhang , Yuqi Liang , Wen Jin, Clustering DTDs: an interactive two-level approach, Journal of Computer Science and Technology, v.17 n.6, p.807-819, November 2002
|
|
|
|
|
|
Jernej Barbič , Alla Safonova , Jia-Yu Pan , Christos Faloutsos , Jessica K. Hodgins , Nancy S. Pollard, Segmenting motion capture data into distinct behaviors, Proceedings of the 2004 conference on Graphics interface, p.185-194, May 17-19, 2004, London, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Elke Achtert , Christian Böhm , Hans-Peter Kriegel , Peer Kröger , Arthur Zimek, Deriving quantitative models for correlation clusters, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christian Böhm , Christos Faloutsos , Jia-Yu Pan , Claudia Plant, Robust information-theoretic clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Wei Tang , Hui Xiong , Shi Zhong , Jie Wu, Enhancing semi-supervised clustering: a feature projection perspective, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Byron J. Gao , Obi L. Griffith , Martin Ester , Steven J. M. Jones, Discovering significant OPSM subspace clusters in massive gene expression data, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
Yi-Hong Chu , Jen-Wei Huang , Kun-Ta Chuang , Ming-Syan Chen, On subspace clustering with density consciousness, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
Lian Duan , Lida Xu , Feng Guo , Jun Lee , Baopin Yan, A local-density based spatial clustering algorithm with noise, Information Systems, v.32 n.7, p.978-986, November, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Charu C. Aggarwal , Jiawei Han , Jianyong Wang , Philip S. Yu, A framework for projected clustering of high dimensional data streams, Proceedings of the Thirtieth international conference on Very large data bases, p.852-863, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
Laks V. S. Lakshmanan , Raymond T. Ng , Christine Xing Wang , Xiaodong Zhou , Theodore J. Johnson, The generalized MDL approach for summarization, Proceedings of the 28th international conference on Very Large Data Bases, p.766-777, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carlotta Domeniconi , Dimitrios Gunopulos , Sheng Ma , Bojun Yan , Muna Al-Razgan , Dimitris Papadopoulos, Locally adaptive metrics for clustering high dimensional data, Data Mining and Knowledge Discovery, v.14 n.1, p.63-97, February 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thierry Urruty , Stanislas Lew , Nacim Ihadaddene , Dan A. Simovici, Detecting eye fixations by projection clustering, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.3 n.4, p.1-20, December 2007
|
|
|
|
|
|
|
|
|
|
|
|
Dimitris K. Tasoulis , Panagiota Spyridonos , Nicos G. Pavlidis , Vassilis P. Plagianakos , Panagiota Ravazoula , Georgios Nikiforidis , Michael N. Vrahatis, Cell-nuclear data reduction and prognostic model selection in bladder tumor recurrence, Arificial Intelligence in Medicine, v.38 n.3, p.291-303, November, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hannes Heikinheimo , Eino Hinkkanen , Heikki Mannila , Taneli Mielikäinen , Jouni K. Seppänen, Finding low-entropy sets and trees from binary data, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anthony Martinet , Jean Martinet , Nacim Ihaddadene , Stanislas Lew , Chabane Djeraba, Analyzing eye fixations and gaze orientations on films and pictures, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ira Assent , Ralph Krieger , Emmanuel Müller , Thomas Seidl, EDSC: efficient density-based subspace clustering, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Nancy P. Lin , Chung-I Chang , Nien-yi Jan , Hao-En Chueh , Hung-Jen Chen , Wei-Hua Hao, An axis-shifted crossover-imaged clustering algorithm, WSEAS TRANSACTIONS on SYSTEMS, v.7 n.3, p.175-184, March 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Emmanuel Müller , Ira Assent , Ralph Krieger , Timm Jansen , Thomas Seidl, Morpheus: interactive exploration of subspace clustering, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yang Xiang , Ruoming Jin , David Fuhry , Feodor F. Dragan, Succinct summarization of transactional databases: an overlapped hyperrectangle scheme, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Chi-Hoon Lee , Osmar R. Zaïane , Ho-Hyun Park , Jiayuan Huang , Russell Greiner, Clustering high dimensional data: A graph-based relaxed optimization approach, Information Sciences: an International Journal, v.178 n.23, p.4501-4511, December, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guang-Hui Yan , Li-Song Liu , Lin-Na Du , Xia-Xia Yang , Zhi-Cheng Ma , Xiao-Min Zhang, Multifractal-based cluster hierarchy optimisation algorithm, International Journal of Business Intelligence and Data Mining, v.3 n.4, p.353-374, January 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yihong Dong , Shaoka Cao , Ken Chen , Maoshun He , Xiaoying Tai, PFHC: A clustering algorithm based on data partitioning for unevenly distributed datasets, Fuzzy Sets and Systems, v.160 n.13, p.1886-1901, July, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guiomar Corral , Elisabet Golobardes , Oriol Andreu , Isard Serra , Elisabet Maluquer , Àngel Martínez, Application of Clustering Techniques in a Network Security Testing System, Proceeding of the 2005 conference on Artificial Intelligence Research and Development, p.157-164, May 12, 2005
|
|
|
|
|
|
Lin Zhu , Fu-Lai Chung , Shitong Wang, Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, v.39 n.3, p.578-591, June 2009
|
|
|
|
|
|
|
|