ACM Home Page
Please provide us with feedback. Feedback
R* optimizer validation and performance evaluation for local queries
Full text PdfPdf (1.43 MB)
Source International Conference on Management of Data archive
Proceedings of the 1986 ACM SIGMOD international conference on Management of data table of contents
Washington, D.C., United States
Pages: 84 - 95  
Year of Publication: 1986
ISBN:0-89791-191-1
Also published in ...
Authors
Lothar F. Mackert  Univensity of Erlangen-Nürnberg IMMD-IV Martensstr 3, D-8520 Erlangen, West Germany
Guy M. Lohman  IBM Almaden Research Center, San Jose, CA
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 53,   Citation Count: 42
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/16894.16863
What is a DOI?

ABSTRACT

Few database query optimizer models have been validated against actual performance. This paper presents the methodology and results of a thorough validation of the optimizer and evaluation of the performance of the experimental distributed relational database management system R*, which inherited and extended to a distributed environment the optimization algorithms of System R. Optimizer estimated costs and actual R* resources consumed were written to database tables using new SQL commands, permitting automated control from SQL application programs of test data collection and reduction. A number of tests were run over a wide variety of dynamically-created test databases, SQL queries, and system parameters. The results for single-table access, sorting, and local 2-table joins are reported here. The tests confirmed the accuracy of the majority of the I/O cost model, the significant contribution of CPU cost to total cost, and the need to model CPU cost in more detail than was done in System R. The R* optimizer now retains cost components separately and estimates the number of CPU instructions, including those for applying different kinds of predicates. The sensitivity of I/O cost to buffer space motivated the development of more detailed models of buffer utilization unclustered index scans and nested-loop joins often benefit from pages remaining in the buffers, whereas concurrent scans of the data pages and the index pages for multiple tables during joins compete for buffer share. Without an index on the join column of the inner table, the optimizer correctly avoids the nested-loop join, confirming the need for merge-scan joins. When the join column of the inner is indexed, the optimizer overestimates the cost of the nested-loop join, whose actual performance is very sensitive to three parameters that are extremely difficult to estimate (1) the join (result) cardinality, (2) the outer table's cardinality, and (3) the number of buffer pages available to store the inner table. Suggestions are given for improved database statistics, prefetch and page replacement strategies for the buffer manager, and the use of temporary indexes and Bloom filters (hashed semijoins) to reduce access of unneeded data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
APER 83
P M G Apers, A R Hevner, and S B Yao, Optnntzmg Algorithms for Dtstnbuted Queries, IEEE Trans on Software Engmeermg SE-9 (January 1983) pp 57-68
 
ASTR 80
M M Astrahan, M Schkolmck, and W Kun, Performance of the System R Access Path Selecuon Mechamsm, Informatton Processing 80 (1980) pp 487-491
BERN 81
 
BLAS 77
M W Blasgen and K P Eswaren, Storage and Access m RelaUonal Data Bases, IBM Systems Journal 16,4 (1977) Also available as "On the EvaluaUon of Queries m a RelaUonal Data Base System", IBM Research Report RJ1745, San Jose, CA, April 1976
BLOO 70
 
BRAT 84
 
CARE 85
M J Carey and H Lu, Some Expertmental Results on Dtstnbuted Jom Algorithms m a Local Network, Procs of the Elewnth International Conf on Very Large Data Bases (Stockholm, Sweden, August 1985)
CHAM 81
 
CHAN 82
 
CHOU 85
H-T Chou and D J DeWltt, An Evaluatmn of Buffer Management Strategtes for Relauonal Database Systems, Procs of the Eleventh International Conf on Very Large Data Bases (Stockholm, Sweden, September 1985) pp 127-141
 
CHU 82
W W Chu and P Hurley, Optunal Query Processing for D#stnbuted Database Systems, IEEE Trans on Computers C-31 (September 1982) pp 835-850
 
DANI 82
D Darnels, P G Selmger, L M Haas, B G Lmdsay, C Mohan, A Walker, and P Wdms, An Introducuon to D#stnbuted Query Compdataon m R*, Procs Second International Conf on D#trtbuted Databases (Berlm, September 1982) Also avatlable as IBM Research Report RJ3497, San Jose, CA, June 1982
DEWI 84
 
DEWI 85
D J DeWltt and R Gerber, Multi-Processor Hash- Based Join Algortthms, Procs of the Eleventh Internatzonal Conf on Very Large Data Bases (Stockholm, Sweden, September 1985) pp 151-164
EFFE 84
EPST 78
 
EPST 80
R Epstein and M Stonebraker, Analysis of Distributed Data Base Processing Strategies, Procs of the Szxth lnternatwnal Conf on Very Large Data Bases (Montreal,IEEE, October 1980) pp 92-101
 
FINK 82
S Fmkelstem, M Schkolmck, and P Tlbeno, DBDSGN -- a Physical Database Design Tool for System R, IEEE Database Engineering 5,1 (1982) pp 228-235
 
GDDM 84
Graphtcal Data D#splay Manager Presentatwn Graphzcs Feature Interactive Chart Utzhty User's GuMe, Release 4, IBM Reference Manual SC33-0111-3 (IBM Corp, October 1984)
 
HEVN 79
A R Hevner and S B Yao, Query Processing m Dlstnbuted Database Systems, IEEE Trans Software Engineering SE-5 (May 1979) pp 177-187
KERS 82
 
LOHM 84
 
LOHM 85
G M Lohman, C Mohan, L M Haas, B G Lmdsay, P G Sehnger, P F Wllms, and D Darnels, Query Processing m R*, Query Processing m Database Systems, Sprmger-Verlag (Klm, Batory, & Remer (eds), 1985) pp 31-47 Also available as IBM Research Report R.14272, San Jose, CA, April 1984
 
MACK 85
L F Mackert and G M Lohman, Index Scans using a Flmte LRU Buffer A Validated I/O Model, IBM Research Report RJ4836 (Almaden Research Center, San Jose, CA,
 
MACK 86
L F Mackert and G M Lohman, R* Optmmmzer Validation and Performance Evaluation for Distributed Quenes, IBM Research Report RJ5050 (Almaden Research Center, San jose, CA,
 
ONUE 83
E Onuegbe, S Rahum, and A R Hevner, Local Query Translation and Optmuzation m a Dmtnbuted System, Procs NCC 1983 (July 1983) pp 229-239
 
PALE 74
F P Palermo, A Data Base Search Problem, Informatwn Systems COINS IV (Procs of COINS-72 Symp) (Mmnu, FL, 1974) pp 67-101 Plenum Press, JuhusT Tou, ed
 
RDT 84
RDT Relational Design Tool, IBM Reference Manual SH20-6415 (IBM Corp, June 1984)
 
SACC 82
 
SCHK 79
M Schkolmck and P Ttbeno, Considerations m Developing a Design Tool for a Relational DBMS, Proc IEEE COMPSAC Conf (Cbacago, November 1979) pp 228-235
SCHK 85
 
SELI 79
P G Selmger, M M Astrahan, D D Chamberhn, R A Lone, and T G Price, Access Path Selection m a Relational Database Management System, Procs of ACM-SIGMOD 20 (1979) pp 23-34
 
SELI 80
P G Sehnger and M Adiba, Access Path Selection m Dtstnbuted Database Management Systems, Procs lnternatwnal Conf on Data Bases (Deen and Hammersly, ed, Umv of Abet pp 204-215
SEVE 76
 
SQL 84
SQL/Data System Application Programming for VM/ System Product, Release 3, IBM Reference Manual SH24-5068-0 (IBM Corp, December 1984)
STON 80
STON 81
 
STON 82
M Stonebraker, J V#oodfdl, J Ranstrom, M Murphy, J Kalash, M Carey, K Arnold, Performance Analysts of Distributed Data Base Systems, Database Engineermg 5 (IEEE Computer Society, December 1982) pp 58-65
 
VTAM 85
Network Program Products Planning (MVS, VSE, and VM), IBM Reference Manual SC23-0110-1 (IBM Corp, April 1985)
WONG 76
YAO 79
YU 83

CITED BY  42

Collaborative Colleagues:
Lothar F. Mackert: colleagues
Guy M. Lohman: colleagues