|
ABSTRACT
With current systems, some important complex queries may take days to complete because of: (1) the volume of data to be processed, (2) limited aggregate resources. Introducing parallelism addresses the first problem. Cheaper, but powerful computing resources solve the second problem. According to a survey by Brodie,1 only 10% of computerized data is in data bases. This is an argument for both more variety and volume of data to be moved into data base systems. We conjecture that the primary reasons for this low percentage are that data base management systems (DBMSs) still need to provide far greater functionality and improved performance compared to a combination of application programs and file systems. This paper addresses the issues and solutions relating to intraquery parallelism in a relational DBMS supporting SQL. Instead of focussing only on a few algorithms for a subset of the problems, we provide a broad framework for the study of the numerous issues that need to be addressed in supporting parallelism efficiently and flexibly. We also discuss the impact that parallelization of complex queries has on short transactions which have stringent response time constraints. The pros and cons of the shared nothing, shared disks and shared everything architectures for parallelism are enumerated. The impact of parallelism on a number of components of an industrial-strength DBMS are pointed out. The different stages of query processing during which parallelism may be gainfully employed are identified. The interactions between parallelism and the traditional systems' pipelining technique are analyzed. Finally, the performance implications of parallelizing a specific complex query are studied. This gives us a range of sample points for different parameters of a parallel system architecture, namely, I/O and communication bandwidth as a function of aggregate MIPS.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
ABCGK79
|
Morton M. Astrahan , Donald D. Chamberlin , W. Frank King, III , Irving L. Traiger, System R: A Relational Data Base Management System, Data Base Systems, Proceedings, 5th Informatik Symposium, p.139-148, September 24-26, 1975
|
 |
AgSe89
|
|
 |
AlCo88
|
|
| |
AnCK89
|
Andrade, J., Carges, M., Kovach, K. Building a Transaction Processing System on UN/X Systems, Proc. Unix Transaction Processing Workshop, Pittsburgh, May 1989, USENIX Association.
|
| |
AnCo88
|
Anderson, M., Cole, R. An Integrated Data Base, In IBM Application System/400 Technology, Document Number SA21-9540, IBM, June 1988.
|
| |
Anon85
|
Anon, et al. A Measure of Transaction Processing Power, Datamation, April 1, 1985.
|
 |
BeCh81
|
|
| |
Bhid88
|
|
 |
BoPu88
|
|
| |
Bora88a
|
Boral, H. Parallelism and Data Management, Technical Report ff umber ACA-ST-156-88. MCC, May 1988.
|
| |
Bora88b
|
|
| |
Borr81
|
Borr, A. Transaction Monitoring in Encompass: Reliable Distributed Transaction Processing, Proc. 7th International Conference on Very Large Data Bases, Cannes, September 1981.
|
| |
Borr84
|
|
 |
CABK88
|
George Copeland , William Alexander , Ellen Boughter , Tom Keller, Data placement in Bubba, Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.99-108, June 01-03, 1988, Chicago, Illinois, United States
|
| |
CaHS85
|
Carr, C., Huddleston, R.L., Strickland, J. Method and Means for the Retention of Locks Across System, Subsystem, and Communication Failures in a Multiprocessing, Multiprogramming, Shared Data Environment, U.S. Patent 4,480,304, IBM, 1985.
|
| |
ChGY81
|
Chamberiin, D., Gilbert, A., Yost, R. A History of System R and SQL/Data System, Proc. 7th International Conference on Very Large Data Bases, Cannes, September 1981.
|
| |
ChGr85
|
Chan, A., Gray, R. Implementing Distributed Read-Only Transactions, IEEE Transactions on Software Engineering, Vol. SE-I 1, No. 2. 1985.
|
| |
CHHIM90
|
Cheng, J., Haderle, D., Hedges, R., lyer, B., Messinger, T., Mohan, C., Wang, Y. An Efficient Hybrid Join Algorithm: Design, Prototype, Modelling and Measurement, fBM Research Report, IBM Almaden Research Center, March 1990.
|
| |
ChMy88
|
|
| |
CLSW84
|
Cheng, J., Loosely, C., Shibamiya, A., Worthington, P. IBM Database 2 Performance: Design, Implementation, and Tuning, IBM Systems Journal, Vol. 23, No. 2, 1984.
|
| |
Corn88
|
Cornelis, R. Site Autonomy in a Distributed Database Environment, Proc. IEEE Compcon Spring '88, San Francisco, March 1988.
|
 |
DeGS88
|
D. J. DeWitt , S. Ghanderaizadeh , D. Schneider, A performance analysis of the gamma database machine, Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.350-360, June 01-03, 1988, Chicago, Illinois, United States
|
| |
DeSB87
|
|
| |
DGRS89
|
Duppel, N., Gugel, D., Reuter, A., Schiele, G. Progress Report #S of PROSPECT, University of Stuttgart, November 1989.
|
| |
DGRSZ88
|
Duppel, N., Gugel. D.. Reuter, A., Schiele, G., Zeller, H. Progress Report #4 of PROSPECT, University of Stuttgart, 1988.
|
| |
DIRY89
|
|
| |
Duqu87
|
|
| |
EGKS89
|
Englert. S.. Gray, J., Kocher, T., Shah, P. A Benchmark of Nonstop SQL Release 2 Demonstrating Near-Linear Speedup and Scaleup on Large Databases, Technical Report 89.4, Tandem Computers, May 1989.
|
| |
Epst88
|
Epstein, R. RDBMS Architecture Boosts Performance, Reliability, Mini- Micro Systems, August 1988.
|
| |
FTSN89
|
Serlin O., editor DEC 9000 to Challenge IBM 3090, FT Systems News Letter, Issue No. 86, Oct. 24 1989.
|
| |
GaKi85
|
Gawlick, D., Kinkade, D. Varieties of Concurrency Control in IMSfVS Fast Path, IEEE Database Engineering, Vol. 8, No. 2, June 1985.
|
| |
GeDe87
|
Gerber, R., Dewitt, D. The impact of Hardware and Software Alternatives on the Performance of the Gamma Database Machine, Computer Sciences Department Technical Report #708, University of Wisconsin at Madison, July 1987.
|
 |
Grae90
|
|
| |
Gray78
|
|
| |
Haer88
|
|
| |
HaJa84
|
Haderle, D., Jackson, R. IBM Database 2 Overview, IBM Systems Journal, Vol. 23, No. 2, 1984.
|
| |
HaSS88
|
T. Harder , H. Schoning , A. Sikeler, Parallelism in processing queries on complex objects, Proceedings of the first international symposium on Databases in parallel and distributed systems, p.131-143, December 05-07, 1988, Austin, Texas, United States
|
| |
HaSS89
|
Haerder, T., Schoning, H., Sikeler, A. Evaluation of Hardware Architectures for Parallel Execution of Complex Database Operations, Proc. 3rd Annual Parallel Processing Symposium, Fullerton, p584-578, 1989. Also Available as Report No. 23/89, University of Kaiserslautern, April 1989.
|
 |
HFLP89
|
L. M. Haas , J. C. Freytag , G. M. Lohman , H. Pirahesh, Extensible query processing in starburst, Proceedings of the 1989 ACM SIGMOD international conference on Management of data, p.377-388, June 1989, Portland, Oregon, United States
|
| |
Hobs87
|
|
| |
HsDe90
|
|
| |
IyDi90
|
|
| |
IyRV89
|
|
| |
JoRo89
|
Joshi, A., Rodwell, K. A Relational Database Management System for Production Applications, Digital Technical Journal, No. 8, February 1989.
|
| |
KITM83
|
Kitsuregawa, M., Tanaka, H., Moto- Oka, T. Application of Hash to Data Base Machine and Its Arachitecture, New Generation Computing, Vol. 1, pp 63-74, 1983.
|
 |
KrLS86
|
|
| |
LDHSY89
|
Lorie, R., Daudenarde, J., Hallmark, G., Stamos, J., Young, H. Adding Intra-Transaction Parallelism to an Existing DBMS: Early Experience, Data Engineering, Vol. 12, No. I, March 1989. Also Available as IBM Research Report RJ6165, ISM Almaden Research Center, 1988.
|
| |
LMHDL85
|
Lohman, G., Mohan, C., Haas, L., Daniels, D., Lindsay, B., Selinger, P., Wilms, P. Query Processing in R*, In Query Processing in Database Systems, W. Kim, D. Reiner, and D. Batory (Ed-s.), Springer-Verlag, 1985. Also Available as IBM Research Report RJ4272, April 1984.
|
| |
Loos86
|
Loosley, C. Measuring ISM Database 2 Release 2 - The Benchmark Game, InfoDB, Vol. I, No. 2, 1986.
|
| |
LoYo89
|
|
| |
McGe77
|
McGee, W.C. The information Management System IMSIVS - Part IV: Data Communication Facilities, IBM Systems Journal, Vol. 16, No. 2, 1977.
|
 |
MHLPS89
|
|
| |
MHWC90
|
C. Mohan , Don Haderle , Yun Wang , Josephine Cheng, Single table access using multiple indexes: optimization, execution, and concurrency control techniques, Proceedings of the international conference on extending database technology on Advances in database technology, p.29-43, March 1990, Venice, Italy
|
| |
MiSu88
|
|
| |
Moha89
|
Mohan, C. ARIESIKVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes, IBM Research Report RJ7008, IBM Almaden Research Center, September 1989.
|
| |
Moha90
|
Mohan, C. Commit-LSN: A Novel and Simple Method for Reducing Locking and Latching in Transaction Processing Systems, IBM Research Report RJ7344, IBM Almaden Research Center, February 1990.
|
| |
MoLe89
|
Mohan, C., Levine, F. ARIESIIM: An Efficient and High Concurrency index Management Method Using Write- Ahead Logging, IBM Research Report RJ6846, IBM Almaden Research Center, August 1989.
|
 |
MoLO86
|
|
| |
MoNa90
|
Mohan, C., Narang, I. ARIES/SD: A Transaction Recovery and Concurrency Control Method for the Shared Disks Environment, IBM Research Report, IBM Almaden Research Center, Forthcoming, 1990.
|
| |
MoNP90
|
Mohan. C., Narang, I., Palmer, J. A Case Study of Problems in Migrating to Distributed Computing: Page Recovery Using Multiple Logs in the Shared Disks Environment, IBM Research Report RJ7343, IBM Almaden Research Center, March 1990.
|
| |
MoPi90
|
Mohan, C., Pirahesh, H. ARIES-RRH: Restricted Repeating of History in the ARIES Transaction Recovery Method, IBM Research Report RJ7342, IBM Almaden Research Center, February 1990.
|
| |
Nech86
|
Neches, P. The Anatomy of a Data Base Computer - Revisited, Proc. IEEE Compcon Spring '86, San Francisco, March 1986.
|
| |
ObSW83
|
Obermarck, R., Strickland, J., Watts, V. Mefhod and Means for the Sharing of Data Resources in a Multiprocessing, Multiprogramming Environment, U.S. Patent 4,399,504, IBM, August 1983.
|
| |
OlRo89
|
|
| |
Onei89
|
O'Neil, P. Revisiting DBMS Benchmarks, Datamation, p47, 48, 52, 54, September 15, 1989.
|
 |
PaGK88
|
David A. Patterson , Garth Gibson , Randy H. Katz, A case for redundant arrays of inexpensive disks (RAID), Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.109-116, June 01-03, 1988, Chicago, Illinois, United States
|
 |
PeSt83
|
|
| |
Rahm87
|
Rahm, E. Design of Optimistic Methods for Concurrency Control in Database Sharing Systems, Proc. 7th International Conference on Distributed Computing Systems, Berlin, September 1987.
|
| |
Rahm88
|
Rahm, E. Design and Evaluation of Concurrency and Coherency Control Techniques for Database Sharing Systems, Internal Report 182188, University of Kaiserslautern, 1988.
|
| |
Rahm89a
|
Rahm, E. A Framework for Workload Allocation in Distributed Transaction Systems, Technical Report, University of Kaiserslautern, September 1989.
|
| |
Rahm89b
|
Rahm, E. Recovery Considerations for Data Sharing Systems, Technical Report, University of Kaiserslautern, October 1989.
|
| |
Reed78
|
|
| |
ReSW89
|
Rengarajan, T.K., Spiro, P., Wright, W. High Availability Mechanisms of VAX DBMS Software, Digital Technical Journal, No. 8, February 1989.
|
| |
Ross89
|
Ross, F. An Overview of FDDI: The Fiber Distributed Data Interface, IEEE Transactions on Selected Areas in Communications, Vol. 7, No. 7, September 1989.
|
 |
SACL79
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
SaGa86
|
|
| |
ScDG89
|
Schneider, D.. Dewitt, D., Ghandeharizadeh, S. An Overview of the Gamma Database Machine, Proc. IEEE Compcon Spring '89, San Francisco, March 1989.
|
| |
Sche87
|
|
| |
Scru87
|
Scrutchin, T. TPF: Performance, Capacity, Availability, Proc. IEEE Compcon Spring '87, San Francisco, February 1987.
|
 |
SeAd80
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
Serl89
|
Serlin, O. ClCS is Key to IBM's OLTP Success, UNIX World, November 1989.
|
| |
Shoe86
|
Shoens, K. Data Sharing vs. Partitioning for Capacity and Availability, Database Engineering, Vol. 9, No. I, March 1986.
|
| |
Siwi77
|
Siwiec, J.E. A High-Performance DBl DC System, IBM Systems Journal, Vol. 16, No. 2, 1977.
|
| |
SKPO88
|
|
| |
SMLC86
|
|
| |
SMMTG84
|
Sekino, A., Moritani, K., Masai, T., Tasaki, T.. Goto, K. The DCS - A New Approach to Multisystem Data- Sharing, Proc. National Computer Conference, Las Vegas, July 1984.
|
| |
SNOP85
|
Sheens, K., Narang, I., Obermarck, R., Palmer, J., Silen, S., Traiger, I., Treiber, K. Amoeba Project, Proc. IEEE Compcon Spring '85, San Francisco, February 1985.
|
| |
SQL23
|
Melton, J. (Ed.) (/SO-ANSI Working Draft) Database Language SQL2 and SQL3, X3H2-90-001 and DBL SEL-3, ISOllEC JTCl ISC21NVG3 N986, December 1989.
|
| |
SSSHD87
|
Robert J. Sundstrom , James B. Staton, III , Gary D. Schultz , Matthew L. Hess , George A. Deaton, Jr. , Leo J. Cole , Robert M. Amy, SNA: current requirements and direction, IBM Systems Journal, v.26 n.1, p.13-36, Jan. 1987
|
| |
StAS89
|
Stonebraker, M., Aoki, P., Seltzer, M. Parallelism in XPRS, Memorandum No. UCBlERL M89/16, University of California, Berkeley, February 1989.
|
| |
Ston86
|
Stonebraker, M. The Case for Shared Nothing, IEEE Database Engineering, Vol. 9, No. 1, 1986.
|
| |
StUW82
|
Strickland, J., Uhrowczik, P., Watts, V. IMSIVS: An Evolving System, IBM Systems Journal, Vol. 21, No. 4, 1982.
|
| |
Tand87
|
|
 |
Tand88
|
|
| |
TeGu84
|
Teng, J., Gumaer, R. Managing IBM Database 2 Buffers to Maximize Performance, IBM Systems Journal, Vol. 23, No. 2, 1984.
|
| |
Tera88
|
Teradata DBCIIO12 Data Base Computer Concepts and Facilities - Release 3.1, Documber Number CO2-0001-05, Teradata Corp., May 1988.
|
| |
TPC89
|
Transaction Processing Performance Council TPC Benchmark A, Draft 6-PR Proposed Standard, 1989. Available from ITOM International Co., POB 1450, Los Altos, CA 94023.
|
| |
TuOB89
|
Turbyfill, C., Orji, C., Bitton, D. AS3AP- A Comparative Relational Database Benchmark, Proc. IEEE-Compcon Spring '89, San Francisco, February 1989.
|
| |
Weih87
|
|
| |
Wolf88
|
|
CITED BY 31
|
|
|
|
|
|
|
|
Chandra Chekuri , Waqar Hasan , Rajeev Motwani, Scheduling problems in parallel query optimization, Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.255-265, May 22-25, 1995, San Jose, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gene Fuh , Jyh-Herng Chow , Nelson Mattos , Brian Tran, Supporting procedural constructs in existing SQL compilers, Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research, p.11, November 12-14, 1996, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|