|
ABSTRACT
We discovered a surprising law governing the spatial join selectivity across two sets of points. An example of such a spatial join is “find the libraries that are within 10 miles of schools”. Our law dictates that the number of such qualifying pairs follows a power law, whose exponent we call “pair-count exponent” (PC). We show that this law also holds for self-spatial-joins (“find schools within 5 miles of other schools”) in addition to the general case that the two point-sets are distinct. Our law holds for many real datasets, including diverse environments (geographic datasets, feature vectors from biology data, galaxy data from astronomy).
In addition, we introduce the concept of the Box-Occupancy-Product-Sum (BOPS) plot, and we show that it can compute the pair-count exponent in a timely manner, reducing the run time by orders of magnitude, from quadratic to linear. Due to the pair-count exponent and our analysis (Law 1), we can achieve accurate selectivity estimates in constant time (O(1)) without the need for sampling or other expensive operations. The relative error in selectivity is about 30% with our fast BOPS method, and even better (about 10%), if we use the slower, quadratic method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
APR+98
|
|
| |
BF 95
|
|
 |
BKS 93
|
Thomas Brinkhoff , Hans-Peter Kriegel , Bernhard Seeger, Efficient processing of spatial joins using R-trees, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.237-246, May 25-28, 1993, Washington, D.C., United States
|
| |
BSW 99
|
|
| |
CEN 89
|
Bureau of the Census - Tiger#Line Precensus Files: 1990 technical documentation. Bureau of the Census. Washington, D.C. 1989.
|
 |
CHR 84
|
|
 |
CMN 99
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, On random sampling over joins, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.263-274, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
DNS 91
|
|
| |
FJS97
|
C. Faloutsos, H.V. Jagadish and N. Sidiropoulos - "Information Recovery from Partial data". Tech. Report ISR-TR-97-7, Inst. For Systems Research, Univ. of Maryland, College Park, MD, 1997.
|
 |
FK 94
|
|
 |
FRM 94
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
GÜN93
|
|
| |
GÜT 94
|
|
| |
HKMRT 96
|
|
 |
KS 97
|
|
| |
KS 98
|
|
| |
KW 99
|
|
 |
LR 94
|
|
 |
MP 99
|
|
| |
MTV 95
|
H. Mannila, H. Toivonen, A. I. Verkamo - "Discovering Frequent Episodes in Sequences". KDD 1995, pp.210-215.
|
| |
NH 94
|
|
 |
ORE 86
|
|
 |
PD 96
|
|
| |
POO 97
|
|
 |
PMT 99
|
Dimitris Papadias , Nikos Mamoulis , Yannis Theodoridis, Processing and optimization of multiway spatial joins using R-trees, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.44-55, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303981]
|
 |
SAC+ 79
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
SCO 92
|
D. W. Scott - Multivariate Density Estimation, Wiley & Sons 1992.
|
| |
SIL 96
|
B. W. Silverman - Density Estimation for Statistics and Data Analysis. Chapman & Hall 1986.
|
| |
SK 96
|
|
| |
SSA 97
|
|
| |
TP 91
|
M. Turk and A. Pentland- "Eigenfaces for Recognition". Journal of cognitive Neuroscience, vol 3(1), 1991, pp. 71-86.
|
 |
TS 96
|
|
| |
TSS 98
|
|
| |
WKS+96
|
|
 |
VW 99
|
|
CITED BY 20
|
|
Agma Traina , Caetano Traina , Spiros Papadimitriou , Christos Faloutsos, Tri-plots: scalable tools for multidimensional data mining, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, p.184-193, August 26-29, 2001, San Francisco, California
|
|
|
|
|
|
Caetano Traina, Jr. , Agma Traina , Roberto Santos Filho , Christos Faloutsos, How to improve the pruning ability of dynamic metric access methods, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Caetano Traina, Jr. , Roberto F. Filho , Agma J. Traina , Marcos R. Vieira , Christos Faloutsos, The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.4, p.483-505, October 2007
|
|
|
|
|
|
|
|
|
|
|
|
Elaine P. Sousa , Caetano Traina, Jr. , Agma J. Traina , Leejay Wu , Christos Faloutsos, A fast and effective method to find correlations among attributes in databases, Data Mining and Knowledge Discovery, v.14 n.3, p.367-407, June 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|