ACM Home Page
Please provide us with feedback. Feedback
Automating physical database design in a parallel database
Full text PdfPdf (1.38 MB)
Source International Conference on Management of Data archive
Proceedings of the 2002 ACM SIGMOD international conference on Management of data table of contents
Madison, Wisconsin
SESSION: Industrial sessions: big data table of contents
Pages: 558 - 569  
Year of Publication: 2002
ISBN:1-58113-497-5
Authors
Jun Rao  IBM Almaden Research Center
Chun Zhang  University of Wisconsin, Madison
Nimrod Megiddo  IBM Almaden Research Center
Guy Lohman  IBM Almaden Research Center
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 99,   Citation Count: 19
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564691.564757
What is a DOI?

ABSTRACT

Physical database design is important for query performance in a shared-nothing parallel database system, in which data is horizontally partitioned among multiple independent nodes. We seek to automate the process of data partitioning. Given a workload of SQL statements, we seek to determine automatically how to partition the base data across multiple nodes to achieve overall optimal (or close to optimal) performance for that workload. Previous attempts use heuristic rules to make those decisions. These approaches fail to consider all of the interdependent aspects of query performance typically modeled by today's sophisticated query optimizers.We present a comprehensive solution to the problem that has been tightly integrated with the optimizer of a commercial shared-nothing parallel database system. Our approach uses the query optimizer itself both to recommend candidate partitions for each table that will benefit each query in the workload, and to evaluate various combinations of these candidates. We compare a rank-based enumeration method with a random-based one. Our experimental results show that the former is more effective.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
5
 
6
{CNW83} Stefano Ceri, et al. Distribution design of logical database schemas. TSE, 9(4), 1983.
 
7
{Cor00a} IBM Corporation. DB2 Universal Database enterprise extended edition Version 7.0. 2000.
 
8
{Cor00b} Informix Corp. http://www.informix.com/informix/solutions/dw/redbrick/ vista. 2000.
 
9
{Cor00c} Oracle Corporation. Oracle 9i database. 2000.
10
11
 
12
 
13
{GLSW93} Peter Gassner, et al. Query Optimization in the DB2 Family. Bulletin of the IEEE Technical Committee on Data Engineering, 16(4), 1993.
 
14
15
16
 
17
{KGV83} S. Kirkpatrick, et al. Optimization by simulated annealing. Science, 220(4598), 1983.
 
18
 
19
20
 
21
22
 
23
{TPC} TPC benchmark H (decision support) revision 1.1.0. http://www.tpc.org/.
 
24
{VZZ+00} Gary Valentin, et al. DB2 Advisor: An optimizer smart enough to recommend its own indexes. In Proceedings of ICDE, 2000.
 
25
 
26
{Zil98} Daniel C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, Dept. of Computer Science, University of Toronto, 1998.

CITED BY  19

Collaborative Colleagues:
Jun Rao: colleagues
Chun Zhang: colleagues
Nimrod Megiddo: colleagues
Guy Lohman: colleagues