|
ABSTRACT
Enterprises may have multiple database systems spread across the organization for redundancy or for serving different applications. In such systems, query workloads can be distributed across different servers for better performance. A materialized view, or Materialized Query Table (MQT), is an auxiliary table with pre-computed data that can be used to significantly improve the performance of a database query. In this paper, we propose a framework for coordinating execution of OLAP query workloads across a database cluster with shared nothing architecture. Such coordination is complex since we need to consider (1) the time to build the MQTs, (2) the query execution impact of the MQTs, (3) whether the MQTs can fit in the disk space limitation, (4) server computation power, and (5) the effectiveness of the scheduling and placement algorithms in deriving a combination of configurations so that the workload can be completed in the shortest time period. We frame the problem as a combinatorial problem with a solution space that is exponential in the number of queries, MQTs, and servers. We provide a stochastic search heuristic that finds a near-optimal mapping of queries-to-servers and MQTs-to-servers within an arbitrarily bounded time and compare our solution with an exhaustive search and three standard greedy algorithms. Our search implementation produced schedules within 9% of the optimal found through an exhaustive search and produced better solutions than typical greedy algorithms for both TPC-H and synthetic benchmarks under a variety of experiments. For a key trial where disk space is limited, it produced 15% better results than the next best competitor, corresponding to an absolute wall clock advantage of over 10 hours.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
J. Blythe , S. Jain , E. Deelman , Y. Gil , K. Vahi , A. Mandal , K. Kennedy, Task scheduling strategies for workflow-based applications in grids, Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2, p.759-767, May 09-12, 2005
|
| |
4
|
Tracy D. Braun , Howard Jay Siegel , Noah Beck , Lasislau L. Bölöni , Muthucumara Maheswaran , Albert I. Reuther , James P. Robertson , Mitchell D. Theys , Bin Yao , Debra Hensgen , Richard F. Freund, A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems, Journal of Parallel and Distributed Computing, v.61 n.6, p.810-837, June 2001
[doi> 10.1006/jpdc.2000.1714]
|
| |
5
|
|
| |
6
|
A. Chakrabarti, et al. "Integration of Scheduling and Replication in Data Grids." In Proceedings of the International Conference on High Performance Computing, 2004.
|
| |
7
|
|
| |
8
|
P. Chen. "Optimal file allocation in multilevel storage systems." In Proceedings of AFIPS, 1973.
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
D. Foster, L. Dowdy, and J. Ames. "File assignment in a computer network." Computer Networks 5, Sept. 1981.
|
| |
13
|
|
| |
14
|
|
| |
15
|
P. Hughes and G. Moe. "A structural approach to computer performance analysis." In Proceedings of AFIPS, 1973.
|
| |
16
|
IBM DB2 9, www.ibm.com/software/data/db2/9/
|
| |
17
|
H. Jiang, D. Gao, W.-S. Li. "Exploiting Correlation and Parallelism for Materialized-View Recommendation in Distributed Data Warehouses," In Proceedings of ICDE, 2007.
|
| |
18
|
|
| |
19
|
Wen-Syan Li , Vishal S. Batra , Vijayshankar Raman , Wei Han , K. Selcuk Candan , Inderpal Narang, Load and Network Aware Query Routing for Information Integration, Proceedings of the 21st International Conference on Data Engineering, p.927-938, April 05-08, 2005
[doi> 10.1109/ICDE.2005.83]
|
| |
20
|
|
| |
21
|
Microsoft SQL Server, www.microsoft.com/sq1/default.mspx
|
| |
22
|
|
| |
23
|
MySQL Cluster www.mysql.com/products/database/cluster/.
|
| |
24
|
Oracle, www.oracle.com
|
| |
25
|
Oracle 11g Real Application Clusters, www.oracle.com/technology/products/database/clustering/index.html
|
| |
26
|
|
 |
27
|
|
| |
28
|
Redbrick, www.informix.com
|
 |
29
|
|
 |
30
|
Kenneth Salem , Kevin Beyer , Bruce Lindsay , Roberta Cochrane, How to roll a join: asynchronous incremental view maintenance, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.129-140, May 15-18, 2000, Dallas, Texas, United States
|
| |
31
|
E. Santos-Neto, W. Cirne, F. Brasileiro, and A. Lima. "Exploiting Replications and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids." In Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing, 2004.
|
| |
32
|
E. Schmueli and D. Feitelson. "Backfilling with lookahed to optimize the packing of parallel jobs," Springer-Verlag Lecture Notes in Computer Science, vol 2862, 2003.
|
 |
33
|
|
| |
34
|
Daniel C. Zilio , Calisto Zuzarte , Guy M. Lohman , Hamid Pirahesh , Jarek Gryz , Eric Alton , Dongming Liang , Gary Valentin, Recommending Materialized Views and Indexes with IBM DB2 Design Advisor, Proceedings of the First International Conference on Autonomic Computing, p.180-188, May 17-18, 2004
|
| |
35
|
Daniel C. Zilio , Jun Rao , Sam Lightstone , Guy Lohman , Adam Storm , Christian Garcia-Arellano , Scott Fadden, DB2 design advisor: integrated automatic physical database design, Proceedings of the Thirtieth international conference on Very large data bases, p.1087-1097, August 31-September 03, 2004, Toronto, Canada
|
|