| Relational confidence bounds are easy with the bootstrap |
| Full text |
Pdf
(338 KB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
table of contents
Baltimore, Maryland
SESSION: Research papers: estimation and approximation
table of contents
Pages: 587 - 598
Year of Publication: 2005
ISBN:1-59593-060-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 38, Citation Count: 0
|
|
|
ABSTRACT
Statistical estimation and approximate query processing have become increasingly prevalent applications for database systems. However, approximation is usually of little use without some sort of guarantee on estimation accuracy, or "confidence bound." Analytically deriving probabilistic guarantees for database queries over sampled data is a daunting task, not suitable for the faint of heart, and certainly beyond the expertise of the typical database system end-user. This paper considers the problem of incorporating into a database system a powerful "plug-in" method for computing confidence bounds on the answer to relational database queries over sampled or incomplete data. This statistical tool, called the bootstrap, is simple enough that it can be used by a data-base programmer with a rudimentary mathematical background, but general enough that it can be applied to almost any statistical inference problem. Given the power and ease-of-use of the bootstrap, we argue that the algorithms presented for supporting the bootstrap should be incorporated into any database system which is intended to support analytic processing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Moses Charikar , Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, Towards estimation error guarantees for distinct values, Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.268-279, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335230]
|
| |
2
|
B. Efron and R. J. Tibshirani: An Introduction to the Bootstrap. Chapman & Hall/CRC 1998
|
| |
3
|
|
 |
4
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, The Aqua approximate query answering system, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.574-576, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
 |
9
|
Peter J. Haas , Jeffrey F. Naughton , S. Seshadri , Arun N. Swami, Fixed-precision estimation of join selectivity, Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.190-201, May 25-28, 1993, Washington, D.C., United States
[doi> 10.1145/153850.153875]
|
| |
10
|
P. Hall: The Bootstrap and Edgeworth Expansion. Springer-Verlag 1995
|
 |
11
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
 |
12
|
|
| |
13
|
N. L Johnson, S. Kotz, A. W. Kemp: Univariate Discrete Distributions. John Wiley and Sons. Inc 1993.
|
| |
14
|
|
 |
15
|
Richard J. Lipton , Jeffrey F. Naughton , Donovan A. Schneider, Practical selectivity estimation through adaptive sampling, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.1-11, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
16
|
|
| |
17
|
C. E. Shannon: A Mathematical Theory of Communication. Bell System Technical Journal, vol. 27:379--423 and 623--656, (July and October, 1948)
|
| |
18
|
D. F. Vysochanskii and Y. I Petunin: Justification of the 3 ó Rule for Unimodal Distributions. Theory of Prob. and Math. Stat., vol 21: 25--36 (1980)
|
| |
19
|
J. Ziv and A. Lempel: A Universal Algorithm for Sequential Data Compression. IEEE Trans. Inform. Theory, 23:337--343 (1977)
|
| |
20
|
|
| |
21
|
|
|