ACM Home Page
Please provide us with feedback. Feedback
A clustering framework based on subjective and objective validity criteria
Full text PdfPdf (1.53 MB)
Source
ACM Transactions on Knowledge Discovery from Data (TKDD) archive
Volume 1 ,  Issue 4  (January 2008) table of contents
Article No. 4  
Year of Publication: 2008
ISSN:1556-4681
Authors
M. Halkidi  Athens University of Economics and Business, Athens-Greece
D. Gunopulos  University of Athens, Athens Greece
M. Vazirgiannis  INRIA/FUTURS and Athens University of Economics and Business, Athens-Greece
N. Kumar  University of California, Riverside, CA
C. Domeniconi  George Mason University, Fairfax, VA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 272,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1324172.1324176
What is a DOI?

ABSTRACT

Clustering, as an unsupervised learning process is a challenging problem, especially in cases of high-dimensional datasets. Clustering result quality can benefit from user constraints and objective validity assessment. In this article, we propose a semisupervised framework for learning the weighted Euclidean subspace, where the best clustering can be achieved. Our approach capitalizes on: (i) user constraints; and (ii) the quality of intermediate clustering results in terms of their structural properties. The proposed framework uses the clustering algorithm and the validity measure as its parameters. We develop and discuss algorithms for learning and tuning the weights of contributing dimensions and defining the “best” clustering obtained by satisfying user constraints. Experimental results on benchmark datasets demonstrate the superiority of the proposed approach in terms of improved clustering accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
Bar-Hillel, A., Hertz, T., Shental, N., and Weinshall, D. 2003. Learning distance function using equivalence relations. In Proceedings of the International Conference on Machine Learning (ICML).
6
 
7
8
 
9
10
 
11
Cohn, D., Caruana, R., and McCallum, A. 2003. Semi-Supervised clustering with user feedback. Tech. Rep. TR2003-1892, Cornell University, Ithaca, NY.
 
12
Ester, M., Kriegel, H.-P., Sender, J., and Xu, X. 1997. Sensity-Connected sets and their application for trend detection in spatial databases. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 10--15.
 
13
 
14
 
15
Frigui, H. and Nasraoui, O. 2004. Unsupervised learning of prototypes and attribute weights. Pattern Recogn. 37, 3, 943--952.
 
16
Gao, J., Tan, P.-N., and Cheng, H. 2005. Semi-Supervised fuzzy clustering with pairwise-constrained competitive agglomeration. In IEEE Conference on Fuzzy Systems.
 
17
 
18
 
19
Hinneburg, A. and Keim, D. 1998. An efficient approach toclustering in large multimedia databases with noise. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 58--65.
 
20
Hogg, R. and Craig, A. 1978. Introduction to Mathematical Statistics. Macmillan, New York.
 
21
Hubert, L. and Arabie, P. 1985. Comparing partitions. J. Classif.
22
 
23
Jing, L., Ng, M., and Huang, J. X. 2005. Subspace clustering of text documents with feature weighting k-means algorithm. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, vol. 3518. Springer, Berlin.
24
 
25
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Symposium on Math, Statistics and Probability, University of California Press, Berkeley, CA, 281--297.
 
26
 
27
 
28
Segal, E., Wang, H., and Koller, D. 2003. Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics 19, 264--272.
 
29
Stein, B., zu Eissen, S. M., and Wibrock, F. 2003. On cluster validity and the information need of users. In Proceedings of the Artificial Intelligenece and Applications Conference.
 
30
 
31
 
32
Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. 2002. Distance metric learning, with application to clustering with side-information. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).
 
33


Collaborative Colleagues:
M. Halkidi: colleagues
D. Gunopulos: colleagues
M. Vazirgiannis: colleagues
N. Kumar: colleagues
C. Domeniconi: colleagues