ACM Home Page
Please provide us with feedback. Feedback
Load distribution of analytical query workloads for database cluster architectures
Full text PdfPdf (277 KB)
Source ACM International Conference Proceeding Series; Vol. 261 archive
Proceedings of the 11th international conference on Extending database technology: Advances in database technology table of contents
Nantes, France
SESSION: Research sessions: Physical design table of contents
Pages 169-180  
Year of Publication: 2008
ISBN:978-1-59593-926-5
Authors
Thomas Phan  Yahoo!, Inc., Sunnyvale, CA
Wen-Syan Li  IBM Almaden Research Center, San Jose, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 99,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1353343.1353367
What is a DOI?

ABSTRACT

Enterprises may have multiple database systems spread across the organization for redundancy or for serving different applications. In such systems, query workloads can be distributed across different servers for better performance. A materialized view, or Materialized Query Table (MQT), is an auxiliary table with pre-computed data that can be used to significantly improve the performance of a database query. In this paper, we propose a framework for coordinating execution of OLAP query workloads across a database cluster with shared nothing architecture. Such coordination is complex since we need to consider (1) the time to build the MQTs, (2) the query execution impact of the MQTs, (3) whether the MQTs can fit in the disk space limitation, (4) server computation power, and (5) the effectiveness of the scheduling and placement algorithms in deriving a combination of configurations so that the workload can be completed in the shortest time period. We frame the problem as a combinatorial problem with a solution space that is exponential in the number of queries, MQTs, and servers. We provide a stochastic search heuristic that finds a near-optimal mapping of queries-to-servers and MQTs-to-servers within an arbitrarily bounded time and compare our solution with an exhaustive search and three standard greedy algorithms. Our search implementation produced schedules within 9% of the optimal found through an exhaustive search and produced better solutions than typical greedy algorithms for both TPC-H and synthetic benchmarks under a variety of experiments. For a key trial where disk space is limited, it produced 15% better results than the next best competitor, corresponding to an absolute wall clock advantage of over 10 hours.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
 
5
 
6
A. Chakrabarti, et al. "Integration of Scheduling and Replication in Data Grids." In Proceedings of the International Conference on High Performance Computing, 2004.
 
7
 
8
P. Chen. "Optimal file allocation in multilevel storage systems." In Proceedings of AFIPS, 1973.
 
9
10
 
11
 
12
D. Foster, L. Dowdy, and J. Ames. "File assignment in a computer network." Computer Networks 5, Sept. 1981.
 
13
 
14
 
15
P. Hughes and G. Moe. "A structural approach to computer performance analysis." In Proceedings of AFIPS, 1973.
 
16
IBM DB2 9, www.ibm.com/software/data/db2/9/
 
17
H. Jiang, D. Gao, W.-S. Li. "Exploiting Correlation and Parallelism for Materialized-View Recommendation in Distributed Data Warehouses," In Proceedings of ICDE, 2007.
 
18
 
19
 
20
 
21
Microsoft SQL Server, www.microsoft.com/sq1/default.mspx
 
22
 
23
MySQL Cluster www.mysql.com/products/database/cluster/.
 
24
Oracle, www.oracle.com
 
25
Oracle 11g Real Application Clusters, www.oracle.com/technology/products/database/clustering/index.html
 
26
27
 
28
Redbrick, www.informix.com
29
30
 
31
E. Santos-Neto, W. Cirne, F. Brasileiro, and A. Lima. "Exploiting Replications and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids." In Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing, 2004.
 
32
E. Schmueli and D. Feitelson. "Backfilling with lookahed to optimize the packing of parallel jobs," Springer-Verlag Lecture Notes in Computer Science, vol 2862, 2003.
33
 
34
 
35


Collaborative Colleagues:
Thomas Phan: colleagues
Wen-Syan Li: colleagues