ACM Home Page
Please provide us with feedback. Feedback
Approximate data mining in very large relational data
Full text PdfPdf (247 KB)
Source ACM International Conference Proceeding Series; Vol. 170 archive
Proceedings of the 17th Australasian Database Conference - Volume 49 table of contents
Hobart, Australia
Pages: 3 - 13  
Year of Publication: 2006
ISBN ~ ISSN:1445-1336 , 1-920682-31-7
Authors
James C. Bezdek  Department of Computer Science, University of West Florida, Pensacola, FL
Richard J. Hathaway  Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA
Christopher Leckie  Department of Computer Science and Software Engineering, University of Melbourne, Victoria, Australia
Ramamohanarao Kotagiri  Department of Computer Science and Software Engineering, University of Melbourne, Victoria, Australia
Publisher
Australian Computer Society, Inc.  Darlinghurst, Australia, Australia
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 68,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

In this paper we discuss eNERF, an extended version of non-Euclidean relational fuzzy c-means (NERFCM) for approximate clustering in very large (unloadable) relational data. The eNERF procedure consists of four parts: (i) selection of distinguished features by algorithm DF to be monitored during progressive sampling; (ii) progressively sampling a square N×N relation matrix RN by algorithm PS until an n×n sample relation Rn passes a goodness of fit test; (iii) Clustering Rn using algorithm LNERF; and (iv), extension of the LNERF results to RN-Rn by algorithm xNERF, which uses an iterative procedure based on LNERF to compute fuzzy membership values for all of the objects remaining after LNERF clustering of the accepted sample. Three of the four algorithms are new - only LNERF (called NERFCM in the original literature) precedes this article.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Bradley, P., Fayyad, U. and Reina, C. (1998): Scaling clustering algorithms to large databases. Proc. 4th Int'l. Conf. Knowledge Discovery and Data Mining, 9-15, AAAI Press, Menlo Park, CA,.
 
4
Dunn, J.C. (1976): Indices of partition fuzziness and the detection of clusters in large data sets, in Fuzzy Automata and Decision Processes, M.M. Gupta (ed), Elsevier, NY.
 
5
Fayyad, U. and Smyth, P. (1996): From massive data sets to science catalogs: applications and challenges. Proc. Workshop on Massive Data Sets, J. Kettering and D. Pregibon (eds), National Research Council.
 
6
 
7
Hathaway, R.J. and Bezdek, J.C. (1994): NERF c-means: non-Euclidean relational fuzzy clustering. Patt. Recog., 27(3), 429-437.
 
8
Hathaway, R.J. and Bezdek, J.C. (2005). Approximate clustering in very large data sets. In press, Comp. Statistics and Data Analysis.
 
9
 
10
Huband, J., Bezdek, J.C. and Hathaway, R.J. (2005): bigVAT: visual assessment of cluster tendency for large data sets. Patt. Recog., 38, 1875-1886.
 
11
Huber, P., (1996): Massive data workshop: The morning after. Massive Data Sets, 169-184, National Academy Press.
 
12
 
13
Pal, N.R. and Bezdek, J.C. (2002): Complexity reduction for "large image" processing. IEEE Trans. on Systems, Man and Cybernetics, B(32), 598-611.
 
14
Pal, N.R., Keller, J.M., Mitchell, J.A., Popescu, M., Huband, J.M. and Bezdek, J.C. (2005): Gene ontology-based knowledge discovery through fuzzy cluster analysis, in press, Neural, Parallel and Scientific Computing.
 
15
Taskar, B., Segal, E. and Koller, D. (2001). Probabilistic clustering in relational data. 17th International Joint Conference on Artificial Intelligence, 870-876, Seattle, USA.

Collaborative Colleagues:
James C. Bezdek: colleagues
Richard J. Hathaway: colleagues
Christopher Leckie: colleagues
Ramamohanarao Kotagiri: colleagues