|
ABSTRACT
Exact computation for aggregate queries usually requires large amounts of memory - constrained in data-streaming - or communication - constrained in distributed computation - and large processing times. In this situation, approximation techniques with provable guarantees, like sketches, are the only viable solution. The performance of sketches crucially depends on the ability to efficiently generate particular pseudo-random numbers. In this paper we investigate both theoretically and empirically the problem of generating k-wise independent pseudo-random numbers and, in particular, that of generating 3 and 4-wise independent pseudo-random numbers that are fast range-summable (i.e., they can be summed up in sub-linear time). Our specific contributions are: (a) we provide an empirical comparison of the various pseudo-random number generating schemes, (b) we study both theoretically and empirically the fast range-summation practicality for the 3 and 4-wise independent generating schemes and we provide efficient implementations for the 3-wise independent schemes, (c) we show convincing theoretical and empirical evidence that the extended Hamming scheme performs as well as any 4-wise independent scheme for estimating the size of join using AMS-sketches, even though it is only 3-wise independent. We use this generating scheme to produce estimators that significantly out-perform the state-of-the-art solutions for two problems - size of spatial joins and selectivity estimation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-join sizes in limited storage. J. Comput. Syst. Sci., 64(3):719--747, 2002.
|
 |
4
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
| |
5
|
Ziv Bar-Yosseff , Ravi Kumar , D. Sivakumar, Reductions in streaming algorithms, with an application to counting triangles in graphs, Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.623-632, January 06-08, 2002, San Francisco, California
|
| |
6
|
A. R. Calderbank , A. Gilbert , K. Levchenko , S. Muthukrishnan , M. Strauss, Improved range-summable random variable construction algorithms, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, January 23-25, 2005, Vancouver, British Columbia
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
A. S. Hedayat, N. J. A. Sloane, and J. Stufken. Orthogonal Arrays: Theory and Applications. Springer-Verlag, 1999.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
K. Levchenko. http://www.cse.ucsd.edu/~klevchen/II-2005.pdf.
|
| |
19
|
|
| |
20
|
MassDAL. http://www.cs.rutgers.edu/~muthu/massdal.html.
|
| |
21
|
F. Rusu and A. Dobra. Fast range-summable random variables for efficient aggregate estimation. Manuscript, 2005.
|
 |
22
|
|
|