ACM Home Page
Please provide us with feedback. Feedback
Finding generalized projected clusters in high dimensional spaces
Full text PdfPdf (774 KB)
Source International Conference on Management of Data archive
Proceedings of the 2000 ACM SIGMOD international conference on Management of data table of contents
Dallas, Texas, United States
Pages: 70 - 81  
Year of Publication: 2000
ISBN:1-58113-217-4
Also published in ...
Authors
Charu C. Aggarwal  IBM T.J. Watson Research Center, Yorktown Heights, NY
Philip S. Yu  IBM T.J. Watson Research Center, Yorktown Heights, NY
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 97,   Citation Count: 83
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/342009.335383
What is a DOI?

ABSTRACT

High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results indicate that in high dimensional data, even the concept of proximity or clustering may not be meaningful. We discuss very general techniques for projected clustering which are able to construct clusters in arbitrarily aligned subspaces of lower dimensionality. The subspaces are specific to the clusters themselves. This definition is substantially more general and realistic than currently available techniques which limit the method to only projections from the original set of attributes. The generalized projected clustering technique may also be viewed as a way of trying to redefine clustering for high dimensional applications by searching for hidden subspaces with clusters which are created by inter-attribute correlations. We provide a new concept of using extended cluster feature vectors in order to make the algorithm scalable for very large databases. The running time and space requirements of the algorithm are adjustable, and are likely ta tradeoff with better accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
 
5
M. Ester et. al. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD Conference, 1996.
6
 
7
8
 
9
10
 
11
 
12
I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.
13
 
14
R. Kohavi, D. Sommerfield. Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology. KDD, 1995.
 
15
16
 
17
18

CITED BY  83

Collaborative Colleagues:
Charu C. Aggarwal: colleagues
Philip S. Yu: colleagues