| Analysis and application of adaptive sampling |
| Full text |
Pdf
(204 KB)
|
| Source
|
Symposium on Principles of Database Systems
archive
Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
table of contents
Dallas, Texas, United States
Pages: 260 - 267
Year of Publication: 2000
ISBN:1-58113-214-X
|
|
Author
|
|
James F. Lynch
|
Department of Mathematics and Computer Science, Box 5815, Clarkson University, Potsdam, NY
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 26, Citation Count: 2
|
|
|
ABSTRACT
An estimation algorithm for a query is a probabilistic algorithm that computes an approximation for the size (number of tuples) of the query. The main question that is studied is which classes of logically definable queries have fast estimation algorithms. Evidence from descriptive complexity theory is provided that indicates not all such queries have fast estimation algorithms. However, it is shown that on classes of structures of bounded degree, all first-order queries have fast estimation algorithms.These estimation algorithms use a form of statistical sampling known as adaptive sampling. Several versions of adaptive sampling have been developed by other researchers. The original version has been surpassed in some ways by a newer version and a more specialized Monte-Carlo algorithm. An analysis of the average run time of the original version is given, and the different algorithms are compared. The analysis is used to compute what appears to be the best known upper bound on the efficiency of the original algorithm. Also, contrary to what seems to be a commonly held opinion, the two methods of adaptive sampling are incomparable. Which method is superior depends on the query being estimated and the criteria that are being applied. Lastly, adaptive sampling can be more efficient than the Monte-Carlo algorithm if knowledge about the maximum values of the data being sampled is available.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Serge Abiteboul , Kevin Compton , Victor Vianu, Queries are easier than you thought (probably), Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.23-32, June 02-05, 1992, San Diego, California, United States
[doi> 10.1145/137097.137105]
|
| |
2
|
W. Bartky. Multiple sampling with constant probability. Ann. Math. Statist., 14:363-377, 1943.
|
| |
3
|
A. Berry. The accuracy of the gaussian approximation to the sum of independent variates. Trans. AMS, 49:122-136, 1941.
|
| |
4
|
J. Blum and J. Rosenblat. Probability and Statistics. W. B. Saunders Company, Philadelphia, 1972.
|
| |
5
|
H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist, 23:493-507, 1952.
|
| |
6
|
|
| |
7
|
H.-D. Ebbinghaus and J. Flum. Finite Model Theory. Springer-Verlag, New York, Heidelberg, 1995.
|
| |
8
|
W. Feller. An Introduction to Probability Theory and Its Applications, vol. 1. John Wiley & Sons, New York, 1968.
|
| |
9
|
|
| |
10
|
H. Gaifman. On local and nonlocal properties. In Logic Colloquium '81, pages 105-135. J. Stern, ed., North Holland, 1982.
|
| |
11
|
P. Haas and A. Swami. Sequential sampling, procedures for query size estimation. IBM Research Report, RJ 9101 (80915), 1992.
|
| |
12
|
|
| |
13
|
W. Hanf. Model-theoretic methods in the study of elementary logic. In The Theory of Models, pages 132-145. J. Addison, L. Henkin, and A. Tarski, eds., North Holland, 1965.
|
| |
14
|
L. Hella, L. Libkin, and J. Nurmonen. Notions of locality and their logical characterizations over finite models. J. Symbolic Logic, 64:1751-1773, 1999.
|
| |
15
|
W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Am. Statist. Assoc., 58:13-30, 1963.
|
| |
16
|
|
| |
17
|
|
| |
18
|
Q. Liu. On algorithms for database query size estimation. Master's Thesis, Clarkson University, 1998.
|
| |
19
|
A. Wald. On cumulative sums of random variables. Ann. Math. Statist., 15:283-296, 1944.
|
| |
20
|
A. Wald. Sequential Analysis. John Wiley & Sons, New York, 1947.
|
|