|
ABSTRACT
Data analysts need to understand the quality of data in the warehouse. This is often done by issuing many Group By queries on the sets of columns of interest. Since the volume of data in these warehouses can be large, and tables in a data warehouse often contain many columns, this analysis typically requires executing a large number of Group By queries, which can be expensive. We show that the performance of today's database systems for such data analysis is inadequate. We also show that the problem is computationally hard, and develop efficient techniques for solving it. We demonstrate significant speedup over existing approaches on today's commercial database systems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Sameet Agarwal , Rakesh Agrawal , Prasad Deshpande , Ashish Gupta , Jeffrey F. Naughton , Raghu Ramakrishnan , Sunita Sarawagi, On the Computation of Multidimensional Aggregates, Proceedings of the 22th International Conference on Very Large Data Bases, p.506-521, September 03-06, 1996
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
Nilesh N. Dalvi , Sumit K. Sanghai , Prasan Roy , S. Sudarshan, Pipelining in multi-query optimization, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.59-70, May 2001, Santa Barbara, California, United States
[doi> 10.1145/375551.375561]
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
Graefe G. The Cascades Framework for Query Optimization. In Data Engineering Bulletin (Sept 1995), 19--29.
|
| |
13
|
|
 |
14
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
15
|
Hinneburg A., Habich D., and Lehner W. COMBI-Operator -- Database Support for Data Mining Applications. In Proc. of VLDBA 2003, 429--439.
|
| |
16
|
|
 |
17
|
Kenneth A. Ross , Divesh Srivastava , S. Sudarshan, Materialized view maintenance and integrity constraint checking: trading space for time, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.447-458, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
18
|
|
 |
19
|
Prasan Roy , S. Seshadri , S. Sudarshan , Siddhesh Bhobe, Efficient and extensible algorithms for multi query optimization, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.249-260, May 15-18, 2000, Dallas, Texas, United States
|
| |
20
|
Protein Information Resource (PIR) web site. <u>http://pir.georgetown.edu/</u>
|
| |
21
|
Sarawagi S., Agrawal R., and Gupta A. On Compressing the Data Cube. IBM Technical Report.
|
 |
22
|
|
 |
23
|
|
| |
24
|
TPC Benchmark H. Decision Support. http://www.tpc.org
|
| |
25
|
Daniel C. Zilio , Calisto Zuzarte , Guy M. Lohman , Hamid Pirahesh , Jarek Gryz , Eric Alton , Dongming Liang , Gary Valentin, Recommending Materialized Views and Indexes with IBM DB2 Design Advisor, Proceedings of the First International Conference on Autonomic Computing (ICAC'04), p.180-188, May 17-18, 2004
|
CITED BY 6
|
|
|
|
|
Lei Chen , Raghu Ramakrishnan , Paul Barford , Bee-Chung Chen , Vinod Yegneswaran, Composite subset measures, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|