|
ABSTRACT
The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis. Unfortunately, all known algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the points. In such high dimensional spaces not all dimensions may be relevant to a given cluster. One way of handling this is to pick the closely correlated dimensions and find clusters in the corresponding subspace. Traditional feature selection algorithms attempt to achieve this. The weakness of this approach is that in typical high dimensional data mining applications different sets of points may cluster better for different subsets of dimensions. The number of dimensions in each such cluster-specific subspace may also vary. Hence, it may be impossible to find a single small subset of dimensions for all the clusters. We therefore discuss a generalization of the clustering problem, referred to as the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves. We develop an algorithmic framework for solving the projected clustering problem, and test its performance on synthetic data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
| |
2
|
D. Hand, Order Statistics. John Wiley and Sons, New York, 1981.
|
| |
3
|
M. Berger, I. Rigoutsos. An Algorithm for Point Clustering and Grid Generation. IEEE Transactions on Systems, Man and Cybernetics, Vol. 21, 5:1278-1286, 1991.
|
| |
4
|
M. R. Brito, E. Chavez, A. Quiroz, J. Yukich. Connectivity of the Mutual k-Nearest-Neighbor Graph for Clustering and Outlier Detection. Siatis~ics and Probability Letters, 35 (1997) pages 33-42.
|
| |
5
|
P. Cheeseman, j. Kelly, S. Matthew. AutoClass: A Bayesian Classification System. Proceedings of ~he 5~h International Conference on Machine Learning, Morgan Kaufmann, June 1988.
|
| |
6
|
R. Dubes, A. Jain. Clustering Meihodologies in Exploratory Data Analysis. Advances in Computers, Edited by M. Yovits, Vol. 19, Academic Press, New York, 1980.
|
| |
7
|
M. Ester, H.-P. Kriegel, X. Xu. A Database Interface for Clu.,~tering in Large Spatial Databases. Proceedings of the first International Conference on Knowledge Discovery and Data Mining, 1995.
|
| |
8
|
|
| |
9
|
M. Ester, H.-P. Kriegel, J. Sander, X. Xu. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of ~he 2nd International Conference on Knowledge Discovery in Databases and Da~a Mining, Portland, Oregon, August 1.995.
|
| |
10
|
|
| |
11
|
|
| |
12
|
D. Fisher. Optimization and Simplification of Hierarchical Clusters. Proceedings of ~he International Conference on Knowledge Discovery and Data Mining, August 1995.
|
| |
13
|
|
| |
14
|
T. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, Vol. 38, pp. 293-306, 1985.
|
 |
15
|
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States
|
| |
16
|
|
| |
17
|
|
| |
18
|
L. Kaufman, P. Rousseeuw. Finding Groups in Data- An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics, 1990.
|
| |
19
|
R. Kohavi, D. Sommerfield. Feature Subset Selection Using the Wrapper Method" Overfitting and Dynamic Search Space Topology. Proceedings of ~he First International Conference on Knowledge Discovery and Data Mining, 1995.
|
| |
20
|
R. Lee. Clustering Analysis and its applicagions. Advances in Information Systems Science, edited by :I. Toum, Vol. 8, pp. 169-292, Plenum Press, New York, 1981.
|
| |
21
|
|
 |
22
|
Stefan Berchtold , Christian Böhm , Daniel A. Keim , Hans-Peter Kriegel, A cost model for nearest neighbor search in high-dimensional data space, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.78-86, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263671]
|
| |
23
|
S. Wharton. A Generalized Histogram Clustering for Multidimensional Image Data. Pattern Recognition, Vol. 16, No. 2: pp. 193-199, 1983.
|
| |
24
|
|
| |
25
|
|
 |
26
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 106
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Charu C. Aggarwal , Joel L. Wolf , Kun-Lung Wu , Philip S. Yu, Horting hatches an egg: a new graph-theoretic approach to collaborative filtering, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.201-212, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bing Liu , Yiyuan Xia , Philip S. Yu, Clustering through decision tree construction, Proceedings of the ninth international conference on Information and knowledge management, p.20-29, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aristides Gionis , Alexander Hinneburg , Spiros Papadimitriou , Panayiotis Tsaparas, Dimension induced clustering, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
Amit Deshpande , Luis Rademacher , Santosh Vempala , Grant Wang, Matrix approximation and projective clustering via volume sampling, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, p.1117-1126, January 22-26, 2006, Miami, Florida
|
|
|
|
|
|
Larry T. H. Yu , Fu-lai Chung , Stephen C. F. Chan , Simon M. C. Yuen, Using emerging pattern based projected clustering and gene expression data for cancer detection, Proceedings of the second conference on Asia-Pacific bioinformatics, p.75-84, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Elke Achtert , Christian Böhm , Hans-Peter Kriegel , Peer Kröger , Arthur Zimek, Deriving quantitative models for correlation clusters, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Tang , Hui Xiong , Shi Zhong , Jie Wu, Enhancing semi-supervised clustering: a feature projection perspective, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Bo Long , Xiaoyun Wu , Zhongfei (Mark) Zhang , Philip S. Yu, Unsupervised learning on k-partite graphs, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
Charu C. Aggarwal , Jiawei Han , Jianyong Wang , Philip S. Yu, A framework for projected clustering of high dimensional data streams, Proceedings of the Thirtieth international conference on Very large data bases, p.852-863, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Carlotta Domeniconi , Dimitrios Gunopulos , Sheng Ma , Bojun Yan , Muna Al-Razgan , Dimitris Papadopoulos, Locally adaptive metrics for clustering high dimensional data, Data Mining and Knowledge Discovery, v.14 n.1, p.63-97, February 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thierry Urruty , Stanislas Lew , Nacim Ihadaddene , Dan A. Simovici, Detecting eye fixations by projection clustering, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.3 n.4, p.1-20, December 2007
|
|
|
|
|
|
|
|
|
Charu C. Aggarwal , Na Ta , Jianyong Wang , Jianhua Feng , Mohammed Zaki, Xproj: a framework for projected structural clustering of xml documents, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ira Assent , Ralph Krieger , Emmanuel Müller , Thomas Seidl, EDSC: efficient density-based subspace clustering, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Emmanuel Müller , Ira Assent , Ralph Krieger , Timm Jansen , Thomas Seidl, Morpheus: interactive exploration of subspace clustering, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|