|
ABSTRACT
Few database query optimizer models have been validated against actual performance. This paper presents the methodology and results of a thorough validation of the optimizer and evaluation of the performance of the experimental distributed relational database management system R*, which inherited and extended to a distributed environment the optimization algorithms of System R. Optimizer estimated costs and actual R* resources consumed were written to database tables using new SQL commands, permitting automated control from SQL application programs of test data collection and reduction. A number of tests were run over a wide variety of dynamically-created test databases, SQL queries, and system parameters. The results for single-table access, sorting, and local 2-table joins are reported here. The tests confirmed the accuracy of the majority of the I/O cost model, the significant contribution of CPU cost to total cost, and the need to model CPU cost in more detail than was done in System R. The R* optimizer now retains cost components separately and estimates the number of CPU instructions, including those for applying different kinds of predicates. The sensitivity of I/O cost to buffer space motivated the development of more detailed models of buffer utilization unclustered index scans and nested-loop joins often benefit from pages remaining in the buffers, whereas concurrent scans of the data pages and the index pages for multiple tables during joins compete for buffer share. Without an index on the join column of the inner table, the optimizer correctly avoids the nested-loop join, confirming the need for merge-scan joins. When the join column of the inner is indexed, the optimizer overestimates the cost of the nested-loop join, whose actual performance is very sensitive to three parameters that are extremely difficult to estimate (1) the join (result) cardinality, (2) the outer table's cardinality, and (3) the number of buffer pages available to store the inner table. Suggestions are given for improved database statistics, prefetch and page replacement strategies for the buffer manager, and the use of temporary indexes and Bloom filters (hashed semijoins) to reduce access of unneeded data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
APER 83
|
P M G Apers, A R Hevner, and S B Yao, Optnntzmg Algorithms for Dtstnbuted Queries, IEEE Trans on Software Engmeermg SE-9 (January 1983) pp 57-68
|
| |
ASTR 80
|
M M Astrahan, M Schkolmck, and W Kun, Performance of the System R Access Path Selecuon Mechamsm, Informatton Processing 80 (1980) pp 487-491
|
 |
BERN 81
|
Philip A. Bernstein , Nathan Goodman , Eugene Wong , Christopher L. Reeve , James B. Rothnie, Jr., Query processing in a system for distributed databases (SDD-1), ACM Transactions on Database Systems (TODS), v.6 n.4, p.602-625, Dec. 1981
[doi> 10.1145/319628.319650]
|
| |
BLAS 77
|
M W Blasgen and K P Eswaren, Storage and Access m RelaUonal Data Bases, IBM Systems Journal 16,4 (1977) Also available as "On the EvaluaUon of Queries m a RelaUonal Data Base System", IBM Research Report RJ1745, San Jose, CA, April 1976
|
 |
BLOO 70
|
|
| |
BRAT 84
|
|
| |
CARE 85
|
M J Carey and H Lu, Some Expertmental Results on Dtstnbuted Jom Algorithms m a Local Network, Procs of the Elewnth International Conf on Very Large Data Bases (Stockholm, Sweden, August 1985)
|
 |
CHAM 81
|
D. D. Chamberlin , M. M. Astrahan , W. F. King , R. A. Lorie , J. W. Mehl , T. G. Price , M. Schkolnick , P. Griffiths Selinger , D. R. Slutz , B. W. Wade , R. A. Yost, Support for repetitive transactions and ad hoc queries in System R, ACM Transactions on Database Systems (TODS), v.6 n.1, p.70-94, March 1981
[doi> 10.1145/319540.319550]
|
| |
CHAN 82
|
|
| |
CHOU 85
|
H-T Chou and D J DeWltt, An Evaluatmn of Buffer Management Strategtes for Relauonal Database Systems, Procs of the Eleventh International Conf on Very Large Data Bases (Stockholm, Sweden, September 1985) pp 127-141
|
| |
CHU 82
|
W W Chu and P Hurley, Optunal Query Processing for D#stnbuted Database Systems, IEEE Trans on Computers C-31 (September 1982) pp 835-850
|
| |
DANI 82
|
D Darnels, P G Selmger, L M Haas, B G Lmdsay, C Mohan, A Walker, and P Wdms, An Introducuon to D#stnbuted Query Compdataon m R*, Procs Second International Conf on D#trtbuted Databases (Berlm, September 1982) Also avatlable as IBM Research Report RJ3497, San Jose, CA, June 1982
|
 |
DEWI 84
|
David J DeWitt , Randy H Katz , Frank Olken , Leonard D Shapiro , Michael R Stonebraker , David Wood, Implementation techniques for main memory database systems, Proceedings of the 1984 ACM SIGMOD international conference on Management of data, June 18-21, 1984, Boston, Massachusetts
|
| |
DEWI 85
|
D J DeWltt and R Gerber, Multi-Processor Hash- Based Join Algortthms, Procs of the Eleventh Internatzonal Conf on Very Large Data Bases (Stockholm, Sweden, September 1985) pp 151-164
|
 |
EFFE 84
|
|
 |
EPST 78
|
|
| |
EPST 80
|
R Epstein and M Stonebraker, Analysis of Distributed Data Base Processing Strategies, Procs of the Szxth lnternatwnal Conf on Very Large Data Bases (Montreal,IEEE, October 1980) pp 92-101
|
| |
FINK 82
|
S Fmkelstem, M Schkolmck, and P Tlbeno, DBDSGN -- a Physical Database Design Tool for System R, IEEE Database Engineering 5,1 (1982) pp 228-235
|
| |
GDDM 84
|
Graphtcal Data D#splay Manager Presentatwn Graphzcs Feature Interactive Chart Utzhty User's GuMe, Release 4, IBM Reference Manual SC33-0111-3 (IBM Corp, October 1984)
|
| |
HEVN 79
|
A R Hevner and S B Yao, Query Processing m Dlstnbuted Database Systems, IEEE Trans Software Engineering SE-5 (May 1979) pp 177-187
|
 |
KERS 82
|
|
| |
LOHM 84
|
|
| |
LOHM 85
|
G M Lohman, C Mohan, L M Haas, B G Lmdsay, P G Sehnger, P F Wllms, and D Darnels, Query Processing m R*, Query Processing m Database Systems, Sprmger-Verlag (Klm, Batory, & Remer (eds), 1985) pp 31-47 Also available as IBM Research Report R.14272, San Jose, CA, April 1984
|
| |
MACK 85
|
L F Mackert and G M Lohman, Index Scans using a Flmte LRU Buffer A Validated I/O Model, IBM Research Report RJ4836 (Almaden Research Center, San Jose, CA,
|
| |
MACK 86
|
L F Mackert and G M Lohman, R* Optmmmzer Validation and Performance Evaluation for Distributed Quenes, IBM Research Report RJ5050 (Almaden Research Center, San jose, CA,
|
| |
ONUE 83
|
E Onuegbe, S Rahum, and A R Hevner, Local Query Translation and Optmuzation m a Dmtnbuted System, Procs NCC 1983 (July 1983) pp 229-239
|
| |
PALE 74
|
F P Palermo, A Data Base Search Problem, Informatwn Systems COINS IV (Procs of COINS-72 Symp) (Mmnu, FL, 1974) pp 67-101 Plenum Press, JuhusT Tou, ed
|
| |
RDT 84
|
RDT Relational Design Tool, IBM Reference Manual SH20-6415 (IBM Corp, June 1984)
|
| |
SACC 82
|
|
| |
SCHK 79
|
M Schkolmck and P Ttbeno, Considerations m Developing a Design Tool for a Relational DBMS, Proc IEEE COMPSAC Conf (Cbacago, November 1979) pp 228-235
|
 |
SCHK 85
|
|
| |
SELI 79
|
P G Selmger, M M Astrahan, D D Chamberhn, R A Lone, and T G Price, Access Path Selection m a Relational Database Management System, Procs of ACM-SIGMOD 20 (1979) pp 23-34
|
| |
SELI 80
|
P G Sehnger and M Adiba, Access Path Selection m Dtstnbuted Database Management Systems, Procs lnternatwnal Conf on Data Bases (Deen and Hammersly, ed, Umv of Abet pp 204-215
|
 |
SEVE 76
|
|
| |
SQL 84
|
SQL/Data System Application Programming for VM/ System Product, Release 3, IBM Reference Manual SH24-5068-0 (IBM Corp, December 1984)
|
 |
STON 80
|
|
 |
STON 81
|
|
| |
STON 82
|
M Stonebraker, J V#oodfdl, J Ranstrom, M Murphy, J Kalash, M Carey, K Arnold, Performance Analysts of Distributed Data Base Systems, Database Engineermg 5 (IEEE Computer Society, December 1982) pp 58-65
|
| |
VTAM 85
|
Network Program Products Planning (MVS, VSE, and VM), IBM Reference Manual SC23-0110-1 (IBM Corp, April 1985)
|
 |
WONG 76
|
|
 |
YAO 79
|
|
 |
YU 83
|
|
CITED BY 42
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. Motzkin , R. Ellendula , M. Kamali , S. Tiwari, A tool for performance evaluation of database systems for small computer systems, Proceedings of the 1995 ACM symposium on Applied computing, p.420-426, February 26-28, 1995, Nashville, Tennessee, United States
|
|
|
P. Agrawal , D. Bitton , K. Guh , C. Liu , C. Yu, A case study for distributed query processing, Proceedings of the first international symposium on Databases in parallel and distributed systems, p.124-130, December 05-07, 1988, Austin, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Wang , Haifeng Jiang , Hongjun Lu , Jeffrey Xu Yu, Bloom histogram: path selectivity estimation for XML data with updates, Proceedings of the Thirtieth international conference on Very large data bases, p.240-251, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|