|
ABSTRACT
Distributed Data Mining (DDM) has been very active and enjoying a growing amount attention since its inception. Current DDM techniques regard the distributed data sets as a single virtual table and assume there exists a global model which could be generated if the data were combined/centralized. This paper proposes a similarity-based distributed data mining(SBDDM) framework which explicitly take the differences among distributed sources into consideration. A new similarity measure is introduced and its effectiveness is then evaluated and validated. This paper also illustrates the limitations of current DDM techniques through three concrete case studies. Finally distributed clustering within the SBDDM framework is also discussed.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu , Cecilia Procopiuc , Jong Soo Park, Fast algorithms for projected clustering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.61-72, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
|
| |
4
|
Chan, P. C., & Stolfo, S. (1993). Meta-learning for multistrategy and parallel learning. Proceedings of the Second International Workshop on Multistrategy Learning.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
Venkatesh Ganti , Johannes Gehrke , Raghu Ramakrishnan, A framework for measuring changes in data characteristics, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.126-137, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303989]
|
| |
10
|
|
| |
11
|
Kargupta, H., & Chan, P. (Eds.). (2000). Advances in distributed and parallel data mining. AAAI Press.
|
| |
12
|
Kargupta, H., Park, B., Hershbereger, D., & Johnson, E. (2000). Collective data mining: A new perspective toward distributed data mining. In H. Kargupta and P. Chan (Eds.), Advances in distributed data mining, 133--184. AAAI/MIT.
|
| |
13
|
|
| |
14
|
Li, T., Ogihara, M., & Zhu, S. (2002). Similarity testing between heterogeneous basket databases (Technical Report 781). Computer Science, Univ. of Rochester.
|
| |
15
|
|
| |
16
|
R. Wirth, M. B., & Hipp, J. (2001). When distribution is part of the semantics: A new problem class for distributed knowledge discovery. In Proceedings of workshop on Ubiquitous Data Mining for Mobile and Distributed Environments, PKDD/ECML 2001.
|
| |
17
|
Rafiei, D., & Mendelzon, A. (1997). Similarity-based queries for time series data (pp. 13--25).
|
| |
18
|
Ronkainen, R. (1998). Attribute similarity and event sequence similarity in data mining. Ph.lic.thesis, University of Helsinki. Available as Report C-1998-42, University of Helsinki, Department of Computer Science, October 1998.
|
| |
19
|
Subramonian, R. (1998). Defining diff as a data mining primitive. KDD.
|
| |
20
|
Turnisky, A., & Grossman, R. (2000). A framework for finding distributed data mining strategies that are intermediate between centralized strategies and in-place strategies. Proc. of KDD Workshop on Distributed Data Mining.
|
 |
21
|
|
| |
22
|
Zaki, M., & Ho, C. (Eds.). (2000). Large-scale parallel data mining. Springer.
|
| |
23
|
|
|