|
ABSTRACT
In decision support applications, the ability to provide fast approximate answers to aggregation queries is desirable. One commonly-used technique for approximate query answering is sampling. For many aggregation queries, appropriately constructed biased (non-uniform) samples can provide more accurate approximations than a uniform sample. The optimal type of bias, however, varies from query to query. In this paper, we describe an approximate query processing technique that dynamically constructs an appropriately biased sample for each query by combining samples selected from a family of non-uniform samples that are constructed during a pre-processing phase. We show that dynamic selection of appropriate portions of previously constructed samples can provide more accurate approximate answers than static, non-adaptive usage of uniform or non-uniform samples.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala, Congressional samples for approximate answering of group-by queries, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.487-498, May 15-18, 2000, Dallas, Texas, United States
|
 |
3
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
4
|
|
| |
5
|
A. Agresti and B. A. Coull. Approximate is better than "exact" for interval estimation of binomial proportions. The American Statistician, 52:119--126, 1998.
|
| |
6
|
D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. E. Ioannidis, H. V. Jagadish, T. Johnson, R. T. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey data reduction report. Data Engineering Bulletin, 20(4):3--45, 1997.
|
| |
7
|
L. D. Brown, T. Cai, and A. DasGupta. Interval estimation for a binomial proportion. Statistical Science, 16:101--133, 2001.
|
| |
8
|
|
| |
9
|
|
 |
10
|
Surajit Chaudhuri , Gautam Das , Vivek Narasayya, A robust, optimization-based approach for approximate answering of aggregate queries, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.295-306, May 21-24, 2001, Santa Barbara, California, United States
|
 |
11
|
|
 |
12
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, On random sampling over joins, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.263-274, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
13
|
S. Chaudhuri and V. Narasayya. Program for tpc-d data generation with skew. Available at ftp://ftp.research.microsoft.com/pub/users/viveknar/tpcdskew.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Jim Gray , Surajit Chaudhuri , Adam Bosworth , Andrew Layman , Don Reichart , Murali Venkatrao , Frank Pellow , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery, v.1 n.1, p.29-53, 1997
[doi> 10.1023/A:1009726021843]
|
| |
18
|
|
| |
19
|
|
 |
20
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
21
|
J. Hellerstein, M. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. A. Shah. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7--18, June 2000.
|
 |
22
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
| |
23
|
|
 |
24
|
Zachary G. Ives , Daniela Florescu , Marc Friedman , Alon Levy , Daniel S. Weld, An adaptive query execution system for data integration, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.299-310, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
25
|
|
 |
26
|
|
| |
27
|
Transaction Processing Performance Council. TPC-H benchmark 1.1.0, June 1999. http://www.tpc.org.
|
 |
28
|
|
CITED BY 24
|
|
|
|
|
Larry Huston , Rahul Sukthankar , Rajiv Wickremesinghe , M. Satyanarayanan , Gregory R. Ganger , Erik Riedel , Anastassia Ailamaki, Diamond: A Storage Architecture for Early Discard in Interactive Search, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anja Klein , Rainer Gemulla , Philipp Rösch , Wolfgang Lehner, Derby/S: a DBMS for sample-based query answering, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|