ACM Home Page
Please provide us with feedback. Feedback
Fast algorithms for projected clustering
Full text PdfPdf (1.38 MB)
Source International Conference on Management of Data archive
Proceedings of the 1999 ACM SIGMOD international conference on Management of data table of contents
Philadelphia, Pennsylvania, United States
Pages: 61 - 72  
Year of Publication: 1999
ISBN:1-58113-084-8
Also published in ...
Authors
Charu C. Aggarwal  IBM T. J. Watson Research Center, Yorktown Heights, NY
Joel L. Wolf  IBM T. J. Watson Research Center, Yorktown Heights, NY
Philip S. Yu  IBM T. J. Watson Research Center, Yorktown Heights, NY
Cecilia Procopiuc  Duke University, Durham, NC
Jong Soo Park  Sungshin Women's University, Seoul, Korea
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 179,   Citation Count: 106
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/304182.304188
What is a DOI?

ABSTRACT

The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis. Unfortunately, all known algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the points. In such high dimensional spaces not all dimensions may be relevant to a given cluster. One way of handling this is to pick the closely correlated dimensions and find clusters in the corresponding subspace. Traditional feature selection algorithms attempt to achieve this. The weakness of this approach is that in typical high dimensional data mining applications different sets of points may cluster better for different subsets of dimensions. The number of dimensions in each such cluster-specific subspace may also vary. Hence, it may be impossible to find a single small subset of dimensions for all the clusters. We therefore discuss a generalization of the clustering problem, referred to as the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves. We develop an algorithmic framework for solving the projected clustering problem, and test its performance on synthetic data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
D. Hand, Order Statistics. John Wiley and Sons, New York, 1981.
 
3
M. Berger, I. Rigoutsos. An Algorithm for Point Clustering and Grid Generation. IEEE Transactions on Systems, Man and Cybernetics, Vol. 21, 5:1278-1286, 1991.
 
4
M. R. Brito, E. Chavez, A. Quiroz, J. Yukich. Connectivity of the Mutual k-Nearest-Neighbor Graph for Clustering and Outlier Detection. Siatis~ics and Probability Letters, 35 (1997) pages 33-42.
 
5
P. Cheeseman, j. Kelly, S. Matthew. AutoClass: A Bayesian Classification System. Proceedings of ~he 5~h International Conference on Machine Learning, Morgan Kaufmann, June 1988.
 
6
R. Dubes, A. Jain. Clustering Meihodologies in Exploratory Data Analysis. Advances in Computers, Edited by M. Yovits, Vol. 19, Academic Press, New York, 1980.
 
7
M. Ester, H.-P. Kriegel, X. Xu. A Database Interface for Clu.,~tering in Large Spatial Databases. Proceedings of the first International Conference on Knowledge Discovery and Data Mining, 1995.
 
8
 
9
M. Ester, H.-P. Kriegel, J. Sander, X. Xu. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of ~he 2nd International Conference on Knowledge Discovery in Databases and Da~a Mining, Portland, Oregon, August 1.995.
 
10
 
11
 
12
D. Fisher. Optimization and Simplification of Hierarchical Clusters. Proceedings of ~he International Conference on Knowledge Discovery and Data Mining, August 1995.
 
13
 
14
T. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, Vol. 38, pp. 293-306, 1985.
15
 
16
 
17
 
18
L. Kaufman, P. Rousseeuw. Finding Groups in Data- An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics, 1990.
 
19
R. Kohavi, D. Sommerfield. Feature Subset Selection Using the Wrapper Method" Overfitting and Dynamic Search Space Topology. Proceedings of ~he First International Conference on Knowledge Discovery and Data Mining, 1995.
 
20
R. Lee. Clustering Analysis and its applicagions. Advances in Information Systems Science, edited by :I. Toum, Vol. 8, pp. 169-292, Plenum Press, New York, 1981.
 
21
22
 
23
S. Wharton. A Generalized Histogram Clustering for Multidimensional Image Data. Pattern Recognition, Vol. 16, No. 2: pp. 193-199, 1983.
 
24
 
25
26

CITED BY  106

Collaborative Colleagues:
Charu C. Aggarwal: colleagues
Joel L. Wolf: colleagues
Philip S. Yu: colleagues
Cecilia Procopiuc: colleagues
Jong Soo Park: colleagues