ACM Home Page
Please provide us with feedback. Feedback
Pseudo-random number generation for sketch-based estimations
Full text PdfPdf (706 KB)
Source
ACM Transactions on Database Systems (TODS) archive
Volume 32 ,  Issue 2  (June 2007) table of contents
Article No. 11  
Year of Publication: 2007
ISSN:0362-5915
Authors
Florin Rusu  University of Florida, Gainesville, FL
Alin Dobra  University of Florida, Gainesville, FL
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 76,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242524.1242528
What is a DOI?

ABSTRACT

The exact computation of aggregate queries, like the size of join of two relations, usually requires large amounts of memory (constrained in data-streaming) or communication (constrained in distributed computation) and large processing times. In this situation, approximation techniques with provable guarantees, like sketches, are one possible solution. The performance of sketches depends crucially on the ability to generate particular pseudo-random numbers. In this article we investigate both theoretically and empirically the problem of generating k-wise independent pseudo-random numbers and, in particular, that of generating 3- and 4-wise independent pseudo-random numbers that are fast range-summable (i.e., they can be summed in sublinear time). Our specific contributions are: (a) we provide a thorough comparison of the various pseudo-random number generating schemes; (b) we study both theoretically and empirically the fast range-summation property of 3- and 4-wise independent generating schemes; (c) we provide algorithms for the fast range-summation of two 3-wise independent schemes, BCH and extended Hamming; and (d) we show convincing theoretical and empirical evidence that the extended Hamming scheme performs as well as any 4-wise independent scheme for estimating the size of join of two relations using AMS sketches, even though it is only 3-wise independent. We use this scheme to generate estimators that significantly outperform state-of-the-art solutions for two problems, namely, size of spatial joins and selectivity estimation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Alon, N., Gibbons, P. B., Matias, Y., and Szegedy, M. 2002. Tracking join and self-join sizes in limited storage. J. Comput. Syst. Sci. 64, 3, 719--747.
4
 
5
 
6
 
7
Carter, L. and Wegman, M. N. 1979. Universal classes of hash functions. J. Comput. Syst. Sci. 18, 2, 143--154.
 
8
9
 
10
Deskins, W. E. 1996. Abstract Algebra. Dover, New York.
11
 
12
Ehrenfeucht, A. and Karpinski, M. 1990. The computational complexity of (xor-and) counting problems. Tech. Rep. TR-90-033, ICSI, Berkeley, California.
 
13
14
 
15
Ganguly, S., Garofalakis, M., and Rastogi, R. 2004. Processing data-stream join aggregates using skimmed sketches. In Proceedings of the 9th International Conference on Extending Database Technology. Springer, 569--586.
 
16
 
17
 
18
Goldreich, O. 1997. A sample of samplers: A computational perspective on sampling. Electron. Colloq. Comput. Complex. 4, 20.
 
19
Hedayat, A. S., Sloane, N. J. A., and Stufken, J. 1999. Orthogonal Arrays: Theory and Applications. Springer, New York.
 
20
 
21
22
 
23
Levchenko, K. 2005. http://www.cse.ucsd.edu/~klevchen/II-2005.pdf.
 
24
 
25
Shao, J. 1999. Mathematical Statistics. Springer, New York.
26
 
27
 
28
Wegman, M. and Carter, J. 1981. New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 3, 22, 265--279.