ACM Home Page
Please provide us with feedback. Feedback
Metric space similarity joins
Full text PdfPdf (1.10 MB)
Source
ACM Transactions on Database Systems (TODS) archive
Volume 33 ,  Issue 2  (June 2008) table of contents
Article No. 7  
Year of Publication: 2008
ISSN:0362-5915
Authors
Edwin H. Jacox  University of Maryland, College Park, MD
Hanan Samet  University of Maryland, College Park, MD
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 174,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1366102.1366104
What is a DOI?

ABSTRACT

Similarity join algorithms find pairs of objects that lie within a certain distance ε of each other. Algorithms that are adapted from spatial join techniques are designed primarily for data in a vector space and often employ some form of a multidimensional index. For these algorithms, when the data lies in a metric space, the usual solution is to embed the data in vector space and then make use of a multidimensional index. Such an approach has a number of drawbacks when the data is high dimensional as we must eventually find the most discriminating dimensions, which is not trivial. In addition, although the maximum distance between objects increases with dimension, the ability to discriminate between objects in each dimension does not. These drawbacks are overcome via the introduction of a new method called Quickjoin that does not require a multidimensional index and instead adapts techniques used in distance-based indexing for use in a method that is conceptually similar to the Quicksort algorithm. A formal analysis is provided of the Quickjoin method. Experiments show that the Quickjoin method significantly outperforms two existing techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
 
5
6
7
 
8
 
9
10
11
 
12
 
13
14
 
15
 
16
Dohnal, V., Gennaro, C., Savino, P., and Zezula, P. 2003a. Similarity join in metric spaces. In Proceedings of the 25th European Conference on IR Research (Advances in Information Retrieval) (ECIR'03). Pisa, Italy, 452--467.
 
17
Dohnal, V., Gennaro, C., and Zezula, P. 2003b. Similarity join in metric spaces using eD-Index. In Proceedings of the 14th International Conference on Database and Expert Systems Applications. (DEXA'03). Prague, Czech Republic, 484--493.
 
18
19
 
20
21
22
 
23
 
24
Hettich, S. and Bay, S. D. 1999. The UCI KDD archive. Department of Information and Computer Science. University of California, Irvine, CA. [http://kdd.ics.uci.edu].
25
 
26
Hoare, C. A. R. 1962. Quicksort. Comput. J. 5, 1, 10--15.
 
27
Hristescu, G. and Farach-Colton, M. 1999. Cluster-preserving embedding of proteins. Tech. rep., Department of Computer Science, Rutgers University, Piscataway, NJ.
 
28
29
30
31
32
 
33
 
34
Kahveci, T., Lang, C., and Singh, A. K. 2003. Joining massive high-dimensional datasets. In Proceedings of the 19th IEEE International Conference on Data Engineering. Bangalore, India, 264--276.
 
35
 
36
 
37
 
38
 
39
40
 
41
 
42
 
43
Levenshtein, V. A. 1966. Binary codes capable of correcting deletions, insertion, and reversals. Cybern. Control Theory 10, 8, 707--710.
 
44
Linial, N., London, E., and Rabinovich, Y. 1995. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15, 215--245.
 
45
 
46
 
47
48
49
 
50
 
51
52
 
53
54
 
55
 
56
 
57
 
58
 
59
 
60
 
61
 
62
 
63
Uhlmann, J. K. 1991. Satisfying general proximity/similarity queries with metric trees. Inform. Process. Lett. 40, 4, 175--179.
 
64
Voronoi, G. 1909. Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Deuxiême mémoire: Recherches sur les parallèlloèdres primitifs. Seconde partie. Journal für die Reine und Angewandte Mathematik 136, 2, 67--181.
65
 
66
 
67
 
68
 
69

Collaborative Colleagues:
Edwin H. Jacox: colleagues
Hanan Samet: colleagues