|
ABSTRACT
Executing ad hoc queries against large databases can be prohibitively expensive. Exploratory analysis of data may not require exact answers to queries, however: results based on sampling the data are often satisfactory. Supporting sampling as a primitive SQL operator turns out to be difficult because sampling does not commute with many SQL operators.In this paper, we describe an implementation in IBM® DB2® Universal Database (UDB) of a sampling operator that commutes with some SQL operators. As a result, the query with the sampling operator always returns a random sample of the answers and in many cases runs faster than it would have without such an operator.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala, Congressional samples for approximate answering of group-by queries, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.487-498, May 15-18, 2000, Dallas, Texas, United States
|
 |
2
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
3
|
|
| |
4
|
|
 |
5
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, Random sampling for histogram construction: how much is enough?, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.436-447, June 01-04, 1998, Seattle, Washington, United States
|
 |
6
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, On random sampling over joins, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.263-274, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
7
|
Qi Cheng , Jarek Gryz , Fred Koo , T. Y. Cliff Leung , Linqi Liu , Xiaoyan Qian , K. Bernhard Schiefer, Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database, Proceedings of the 25th International Conference on Very Large Data Bases, p.687-698, September 07-10, 1999
|
 |
8
|
Sumit Ganguly , Phillip B. Gibbons , Yossi Matias , Avi Silberschatz, Bifocal sampling for skew-resistant join size estimation, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.271-281, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
9
|
L. M. Haas , W. Chang , G. M. Lohman , J. McPherson , P. F. Wilms , G. Lapis , B. Lindsay , H. Pirahesh , M. J. Carey , E. Shekita, Starburst Mid-Flight: As the Dust Clears, IEEE Transactions on Knowledge and Data Engineering, v.2 n.1, p.143-160, March 1990
[doi> 10.1109/69.50910]
|
 |
10
|
|
 |
11
|
Peter J. Haas , Jeffrey F. Naughton , Arun N. Swami, On the relative cost of sampling for join selectivity estimation, Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.14-24, May 24-27, 1994, Minneapolis, Minnesota, United States
[doi> 10.1145/182591.182594]
|
 |
12
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
Frank Olken. Random Sampling from Databases. PhD thesis, University of California at Berkeley, 1993.
|
| |
17
|
|
 |
18
|
Hamid Pirahesh , Joseph M. Hellerstein , Waqar Hasan, Extensible/rule based query rewrite optimization in Starburst, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.39-48, June 02-05, 1992, San Diego, California, United States
|
| |
19
|
|
| |
20
|
Transaction Processing Performance Council, 777 No. First Street, Suite 600, San Jose, CA 95112--6311, www.tpc.org. TPC Benchmark#8482;, 2.1.0 edition.
|
| |
21
|
|
|