ACM Home Page
Please provide us with feedback. Feedback
Spatial join selectivity using power laws
Full text PdfPdf (879 KB)
Source International Conference on Management of Data archive
Proceedings of the 2000 ACM SIGMOD international conference on Management of data table of contents
Dallas, Texas, United States
Pages: 177 - 188  
Year of Publication: 2000
ISBN:1-58113-217-4
Also published in ...
Authors
Christos Faloutsos  Computer Science Department Carnegie Mellon University
Bernhard Seeger  Fachbereich Mathematik und Informatik Universität Marburg - Germany
Agma Traina  Computer Science Department, University of Sao Paulo at Sao Carlos - Brazil
Caetano Traina, Jr.
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 43,   Citation Count: 20
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/342009.335412
What is a DOI?

ABSTRACT

We discovered a surprising law governing the spatial join selectivity across two sets of points. An example of such a spatial join is “find the libraries that are within 10 miles of schools”. Our law dictates that the number of such qualifying pairs follows a power law, whose exponent we call “pair-count exponent” (PC). We show that this law also holds for self-spatial-joins (“find schools within 5 miles of other schools”) in addition to the general case that the two point-sets are distinct. Our law holds for many real datasets, including diverse environments (geographic datasets, feature vectors from biology data, galaxy data from astronomy).

In addition, we introduce the concept of the Box-Occupancy-Product-Sum (BOPS) plot, and we show that it can compute the pair-count exponent in a timely manner, reducing the run time by orders of magnitude, from quadratic to linear. Due to the pair-count exponent and our analysis (Law 1), we can achieve accurate selectivity estimates in constant time (O(1)) without the need for sampling or other expensive operations. The relative error in selectivity is about 30% with our fast BOPS method, and even better (about 10%), if we use the slower, quadratic method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
APR+98
 
BF 95
BKS 93
 
BSW 99
 
CEN 89
Bureau of the Census - Tiger#Line Precensus Files: 1990 technical documentation. Bureau of the Census. Washington, D.C. 1989.
CHR 84
CMN 99
 
DNS 91
 
FJS97
C. Faloutsos, H.V. Jagadish and N. Sidiropoulos - "Information Recovery from Partial data". Tech. Report ISR-TR-97-7, Inst. For Systems Research, Univ. of Maryland, College Park, MD, 1997.
FK 94
FRM 94
 
GÜN93
 
GÜT 94
 
HKMRT 96
KS 97
 
KS 98
 
KW 99
LR 94
MP 99
 
MTV 95
H. Mannila, H. Toivonen, A. I. Verkamo - "Discovering Frequent Episodes in Sequences". KDD 1995, pp.210-215.
 
NH 94
ORE 86
PD 96
 
POO 97
PMT 99
SAC+ 79
 
SCO 92
D. W. Scott - Multivariate Density Estimation, Wiley & Sons 1992.
 
SIL 96
B. W. Silverman - Density Estimation for Statistics and Data Analysis. Chapman & Hall 1986.
 
SK 96
 
SSA 97
 
TP 91
M. Turk and A. Pentland- "Eigenfaces for Recognition". Journal of cognitive Neuroscience, vol 3(1), 1991, pp. 71-86.
TS 96
 
TSS 98
 
WKS+96
VW 99

CITED BY  20

Collaborative Colleagues:
Christos Faloutsos: colleagues
Bernhard Seeger: colleagues
Agma Traina: colleagues
Caetano Traina, Jr.: colleagues