| Distributed data clustering can be efficient and exact |
| Full text |
Pdf
(515 KB)
|
| Source
|
ACM SIGKDD Explorations Newsletter
archive
Volume 2 , Issue 2 (December 2000)
table of contents
Special issue on “Scalable data mining algorithms”
Pages: 34 - 38
Year of Publication: 2000
ISSN:1931-0145
|
|
Authors
|
|
George Forman
|
Hewlett-Packard Research Labs., 1501 Page Mill, Palo Alto, CA
|
|
Bin Zhang
|
Hewlett-Packard Research Labs., 1501 Page Mill, Palo Alto, CA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 82, Citation Count: 7
|
|
|
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[BF98] Bradley, P., and Fayyad, U. M., "Refining Initial Points for KM Clustering," Microsoft Technical Report 98-36, May 1998.
|
| |
2
|
[BFR98] Bradley, P., Fayyad, U. M., and Reina, C. A., "Scaling EM Clustering to Large Databases," Microsoft Technical Report, 1998.
|
| |
3
|
[BFR98a] Bradley, P., Fayyad, U. M., and Reina, C. A., "Scaling Clustering to Large Databases," KDD98, 1998.
|
| |
4
|
[DLR77] Dempster, A. P., Laird, N. M., and Rubin, D. B., "Miximum Likelihood from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical Society, Series B, 39(1):1-38, 1977.
|
| |
5
|
|
| |
6
|
|
| |
7
|
[JD77] Anil K. Jain, Richard C. Dubes, "Algorithms for Clustering Data (Prentice Hall Advanced Reference Series : Computer Science)," Prentice Hall, 1977.
|
| |
8
|
[KC99] Kantabutra, S. and Couch, A. L., "Parallel K-Means Clustering Algorithm on NOWs," NECTEC Technical Journal, Vol. 1, No. l, March 1999.
|
| |
9
|
[KR90] Kaufman, L. and Rousseeuw, P. J., "Finding Groups in Data : An Introduction to Cluster Analysis," John Wiley & Sons, 1990.
|
| |
10
|
[M67] MacQueen, J. "Some methods for classification and analysis of multivariate observations," pp. 281-297 in: L. M. Le Cam & J. Neyman [eds.] Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p. 1967.
|
| |
11
|
[MK97] McLachlan, G. J. and Krishnan, T., "The EM Algorithm and Extensions," John Wiley & Sons, Inc., 1997.
|
| |
12
|
[NetPerception] A commercial recommender system, http://www.netperceptions.com
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
[Z00b] Zhang, B. "Generalized K-Harmonic Means - Boosting in Unsupervised Learning", Hewllet-Packard Laboratories Technical Report: http://www.hpl.hp.com/techreports/2000/HPL- 2000-137.html.
|
| |
17
|
|
 |
18
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Clustering
Additional Classification:
F.
Theory of Computation
F.1
COMPUTATION BY ABSTRACT DEVICES
F.1.2
Modes of Computation
Subjects:
Parallelism and concurrency
G.
Mathematics of Computing
G.1
NUMERICAL ANALYSIS
G.1.0
General
Subjects:
Parallel algorithms
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.4
Systems
Subjects:
Distributed databases
H.2.8
Database applications
Subjects:
Data mining
General Terms:
Algorithms,
Design,
Experimentation,
Management,
Measurement,
Performance,
Theory
Keywords:
data mining,
distributed computing,
multidimensional data clustering,
parallel algorithms,
very large databases
|