|
ABSTRACT
We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: Are the two clouds of points separable? What is the smallest/largest pair-wise distance across the two datasets? Which of the two clouds does a new point (feature vector) come from?We propose a new tool, the tri-plot, and its generalization, the pq-plot, which help us answer the above questions. We provide a set of rules on how to interpret a tri-plot, and we apply these rules on synthetic and real datasets. We also show how to use our tool for classification, when traditional methods (nearest neighbor, classification trees) may fail.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Agrawal, J. Gherke, D. Gunopoulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications, 1998.
|
 |
2
|
|
| |
3
|
|
 |
4
|
Stefan Berchtold , Christian Böhm , Daniel A. Keim , Hans-Peter Kriegel, A cost model for nearest neighbor search in high-dimensional data space, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.78-86, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263671]
|
 |
5
|
Stefan Berchtold , Christian Böhm , Hans-Peter Kriegal, The pyramid-technique: towards breaking the curse of dimensionality, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.142-153, June 01-04, 1998, Seattle, Washington, United States
|
| |
6
|
S. Chaudhuri. Data mining and database systems: Where is the intersection? Data Engineering Bulletin, 21(1):4-8, 1998.
|
| |
7
|
|
| |
8
|
M. Ester, A. Frommelt, H.-P. Kriegel, and J. Sander. Algorithms for characterization and trend detection in spatial databases. In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 44-50, 1998.
|
| |
9
|
|
 |
10
|
Christos Faloutsos , Bernhard Seeger , Agma Traina , Caetano Traina, Jr., Spatial join selectivity using power laws, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.177-188, May 15-18, 2000, Dallas, Texas, United States
|
| |
11
|
U. M. Fayyad. Mining databases - towards algorithms for knowledge discovery. Data Engineering Bulletin, 21(1):39-48, 1998.
|
| |
12
|
U. M. Fayyad, C. Reina, and P. S. Bradley. Initialization of iterative refinement clustering algorithms. In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 194-198, 1998.
|
| |
13
|
|
| |
14
|
|
| |
15
|
C. Traina Jr., A. Traina, L. Wu, and C. Faloutsos. Fast feature selection using the fractal dimension. In XV Brazilian Symposium on Databases (SBBD), 2000.
|
| |
16
|
|
| |
17
|
D. A. Keim and H.-P. Kriegel. Possibilities and limits in visualizing large amounts of multidimensional data. In Perceptual Issues in Visualization. Springer, 1994.
|
| |
18
|
|
| |
19
|
|
| |
20
|
Bureau of Census. Tiger/line preeensus files: 1990 technical documentation. Bureau of the Census. Washington, DC, 1989.
|
| |
21
|
|
| |
22
|
M. Schroeder. Fractals, Chaos, Power Laws. W.H. Freeman and Company, New York, 1991.
|
| |
23
|
H. G. Schuster. Deterministic Chaos. VCH Publisher, Weinheim, Basel, Cambridge, New York, 1988.
|
| |
24
|
|
 |
25
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
26
|
|
CITED BY 5
|
|
|
|
|
|
|
|
Aparna Varde , Elke Rundensteiner , Carolina Ruiz , Mohammed Maniruzzaman , Richard Sisson, Jr., LearnMet: learning domain-specific distance metrics for plots of scientific functions, Multimedia Tools and Applications, v.35 n.1, p.29-53, October 2007
|
|
|
Elaine P. Sousa , Caetano Traina, Jr. , Agma J. Traina , Leejay Wu , Christos Faloutsos, A fast and effective method to find correlations among attributes in databases, Data Mining and Knowledge Discovery, v.14 n.3, p.367-407, June 2007
|
|
|
|
|