| Approximate data mining in very large relational data |
| Full text |
Pdf
(247 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 170
archive
Proceedings of the 17th Australasian Database Conference - Volume 49
table of contents
Hobart, Australia
Pages: 3 - 13
Year of Publication: 2006
ISBN ~ ISSN:1445-1336 , 1-920682-31-7
|
|
Authors
|
|
James C. Bezdek
|
Department of Computer Science, University of West Florida, Pensacola, FL
|
|
Richard J. Hathaway
|
Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA
|
|
Christopher Leckie
|
Department of Computer Science and Software Engineering, University of Melbourne, Victoria, Australia
|
|
Ramamohanarao Kotagiri
|
Department of Computer Science and Software Engineering, University of Melbourne, Victoria, Australia
|
|
| Publisher |
Australian Computer Society, Inc.
Darlinghurst, Australia, Australia
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 68, Citation Count: 0
|
|
|
ABSTRACT
In this paper we discuss eNERF, an extended version of non-Euclidean relational fuzzy c-means (NERFCM) for approximate clustering in very large (unloadable) relational data. The eNERF procedure consists of four parts: (i) selection of distinguished features by algorithm DF to be monitored during progressive sampling; (ii) progressively sampling a square N×N relation matrix RN by algorithm PS until an n×n sample relation Rn passes a goodness of fit test; (iii) Clustering Rn using algorithm LNERF; and (iv), extension of the LNERF results to RN-Rn by algorithm xNERF, which uses an iterative procedure based on LNERF to compute fuzzy membership values for all of the objects remaining after LNERF clustering of the accepted sample. Three of the four algorithms are new - only LNERF (called NERFCM in the original literature) precedes this article.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Bradley, P., Fayyad, U. and Reina, C. (1998): Scaling clustering algorithms to large databases. Proc. 4th Int'l. Conf. Knowledge Discovery and Data Mining, 9-15, AAAI Press, Menlo Park, CA,.
|
| |
4
|
Dunn, J.C. (1976): Indices of partition fuzziness and the detection of clusters in large data sets, in Fuzzy Automata and Decision Processes, M.M. Gupta (ed), Elsevier, NY.
|
| |
5
|
Fayyad, U. and Smyth, P. (1996): From massive data sets to science catalogs: applications and challenges. Proc. Workshop on Massive Data Sets, J. Kettering and D. Pregibon (eds), National Research Council.
|
| |
6
|
|
| |
7
|
Hathaway, R.J. and Bezdek, J.C. (1994): NERF c-means: non-Euclidean relational fuzzy clustering. Patt. Recog., 27(3), 429-437.
|
| |
8
|
Hathaway, R.J. and Bezdek, J.C. (2005). Approximate clustering in very large data sets. In press, Comp. Statistics and Data Analysis.
|
| |
9
|
|
| |
10
|
Huband, J., Bezdek, J.C. and Hathaway, R.J. (2005): bigVAT: visual assessment of cluster tendency for large data sets. Patt. Recog., 38, 1875-1886.
|
| |
11
|
Huber, P., (1996): Massive data workshop: The morning after. Massive Data Sets, 169-184, National Academy Press.
|
| |
12
|
|
| |
13
|
Pal, N.R. and Bezdek, J.C. (2002): Complexity reduction for "large image" processing. IEEE Trans. on Systems, Man and Cybernetics, B(32), 598-611.
|
| |
14
|
Pal, N.R., Keller, J.M., Mitchell, J.A., Popescu, M., Huband, J.M. and Bezdek, J.C. (2005): Gene ontology-based knowledge discovery through fuzzy cluster analysis, in press, Neural, Parallel and Scientific Computing.
|
| |
15
|
Taskar, B., Segal, E. and Koller, D. (2001). Probabilistic clustering in relational data. 17th International Joint Conference on Artificial Intelligence, 870-876, Seattle, USA.
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.8
Database applications
Subjects:
Data mining
Additional Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.4
Systems
Subjects:
Relational databases
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Clustering
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.3
Deduction and Theorem Proving
Subjects:
Uncertainty, "fuzzy," and probabilistic reasoning
General Terms:
Algorithms,
Design,
Management,
Theory
Keywords:
cluster analysis,
data mining,
gene product similarities,
non-Euclidean relational fuzzy c-means,
progressive sampling,
relational data,
very large data
|