|
ABSTRACT
Physical database design is important for query performance in a shared-nothing parallel database system, in which data is horizontally partitioned among multiple independent nodes. We seek to automate the process of data partitioning. Given a workload of SQL statements, we seek to determine automatically how to partition the base data across multiple nodes to achieve overall optimal (or close to optimal) performance for that workload. Previous attempts use heuristic rules to make those decisions. These approaches fail to consider all of the interdependent aspects of query performance typically modeled by today's sophisticated query optimizers.We present a comprehensive solution to the problem that has been tightly integrated with the optimizer of a commercial shared-nothing parallel database system. Our approach uses the query optimizer itself both to recommend candidate partitions for each table that will benefit each query in the workload, and to evaluate various combinations of these candidates. We compare a rank-based enumeration method with a random-based one. Our experimental results show that the former is more effective.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
C. K. Baru , G. Fecteau , A. Goyal , H. Hsiao , A. Jhingran , S. Padmanabhan , G. P. Copeland , W. G. Wilson, DB2 parallel edition, IBM Systems Journal, v.34 n.2, p.292-322, 1995
|
 |
3
|
George Copeland , William Alexander , Ellen Boughter , Tom Keller, Data placement in Bubba, Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.99-108, June 01-03, 1988, Chicago, Illinois, United States
|
 |
4
|
|
 |
5
|
|
| |
6
|
{CNW83} Stefano Ceri, et al. Distribution design of logical database schemas. TSE, 9(4), 1983.
|
| |
7
|
{Cor00a} IBM Corporation. DB2 Universal Database enterprise extended edition Version 7.0. 2000.
|
| |
8
|
{Cor00b} Informix Corp. http://www.informix.com/informix/solutions/dw/redbrick/ vista. 2000.
|
| |
9
|
{Cor00c} Oracle Corporation. Oracle 9i database. 2000.
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
{GLSW93} Peter Gassner, et al. Query Optimization in the DB2 Family. Bulletin of the IEEE Technical Committee on Data Engineering, 16(4), 1993.
|
| |
14
|
|
 |
15
|
Kien A. Hua , S. D. Lang , Wen K. Lee, A decomposition-based simulated annealing technique for data clustering, Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.117-128, May 24-27, 1994, Minneapolis, Minnesota, United States
[doi> 10.1145/182591.182605]
|
 |
16
|
|
| |
17
|
{KGV83} S. Kirkpatrick, et al. Optimization by simulated annealing. Science, 220(4598), 1983.
|
| |
18
|
|
| |
19
|
|
 |
20
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
21
|
|
 |
22
|
|
| |
23
|
{TPC} TPC benchmark H (decision support) revision 1.1.0. http://www.tpc.org/.
|
| |
24
|
{VZZ+00} Gary Valentin, et al. DB2 Advisor: An optimizer smart enough to recommend its own indexes. In Proceedings of ICDE, 2000.
|
| |
25
|
|
| |
26
|
{Zil98} Daniel C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, Dept. of Computer Science, University of Toronto, 1998.
|
CITED BY 19
|
|
|
|
|
|
|
|
Ihab F. Ilyas , Jun Rao , Guy Lohman , Dengfeng Gao , Eileen Lin, Estimating compilation time of a query optimizer, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
|
|
|
Zhiyuan Chen , Chen Li , Jian Pei , Yufei Tao , Haixun Wang , Wei Wang , Jiong Yang , Jun Yang , Donghui Zhang, Recent progress on selected topics in database research: a report by nine young Chinese researchers working in the United States, Journal of Computer Science and Technology, v.18 n.5, p.538-552, September 2003
|
|
|
|
|
|
|
|
|
Jim Smith , Sandra Sampaio , Paul Watson , Norman W. Paton, The Design, Implementation and Evaluation of an ODMG Compliant, Parallel Object Database Server, Distributed and Parallel Databases, v.16 n.3, p.275-319, November 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel C. Zilio , Jun Rao , Sam Lightstone , Guy Lohman , Adam Storm , Christian Garcia-Arellano , Scott Fadden, DB2 design advisor: integrated automatic physical database design, Proceedings of the Thirtieth international conference on Very large data bases, p.1087-1097, August 31-September 03, 2004, Toronto, Canada
|
|
|
Gerhard Weikum , Axel Moenkeberg , Christof Hasse , Peter Zabback, Self-tuning database technology and information services: from wishful thinking to viable engineering, Proceedings of the 28th international conference on Very Large Data Bases, p.20-31, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|