ACM Home Page
Please provide us with feedback. Feedback
Loading databases using dataflow parallelism
Full text PdfPdf (1.49 MB)
Source ACM SIGMOD Record archive
Volume 23 ,  Issue 4  (December 1994) table of contents
Pages: 72 - 83  
Year of Publication: 1994
ISSN:0163-5808
Authors
Tom Barclay  Digital Equipment Corporation, San Francisco Systems Center, Microsoft, One Microsoft Way, Redmond, WA
Robert Barnes  Digital Equipment Corporation, San Francisco Systems Center, Microsoft, One Microsoft Way, Redmond, WA
Jim Gray  Digital Equipment Corporation, San Francisco Systems Center, 310 Filbert St., S.F., CA
Prakash Sundaresan  Digital Equipment Corporation, San Francisco Systems Center, Informix, 921 SW Washington St. # 670, Portland, OR
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 31,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/190627.190647
What is a DOI?

ABSTRACT

This paper describes a parallel database load prototype for Digital's Rdb database product. The prototype takes a dataflow approach to database parallelism. It includes an explorer that discovers and records the cluster configuration in a database, a client CUI interface that gathers the load job description from the user and from the Rdb catalogs, and an optimizer that picks the best parallel execution plan and records it in a web data structure. The web describes the data operators, the dataflow rivers among them, the binding of operators to processes, processes to processors, and files to discs and tapes. This paper describes the optimizer's cost-based hierarchical optimization strategy in some detail. The prototype executes the web's plan by spawning a web manager process at each node of the cluster. The managers create the local executor processes, and orchestrate startup, phasing, checkpoint, and shutdown. The execution processes perform one or more operators. Data flows among the operators are via memory-to-memory streams within a node, and via web-manager multiplexed tcp/ip streams among nodes. The design of the transaction and checkpoint/restart mechanisms are also described. Preliminary measurements indicate that this design will give excellent scaleups.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
[DeWitt 2] D. DeWitt, "The Wisconsin Benchmark, Past, Present, and Future", in The Benchmark Handbook for Database and Transaction Processing Systems. 2nd ed., Morgan Kaufmann, San Mateo 1993.
 
4
[Englert] S. Englert, "Performance Benefits of Parallel Query Execution and Mixed Workload Support in NonStop SQL Release 2", Tandem Systems Review, V.6.2, Oct 1990, pp. 12-23.
 
5
[Garey & Johnson] M.R. Garey, D.S. Johnson, Computers and Intractability, W.H. Freeman, 1979.
6
 
7
 
8
 
9
[Hong] W. Hong, Parallel Query Processing Using Shared Memory Multiprocessors and Disk Arrays, Ph.D. Thesis, U.C. Berkeley, 1992.
 
10
[Kitsuregawa 1] M. Kitsuregawa, H. Tanaka, T. Moto-ka, "Application of Hash to Database Machine and Its application," New Generation Computing, 1, 1 pp. 63-74, Springer Verlag, 1983.
 
11
 
12
[Serlin] O. Serlin, "The History of the TPC", in The Benchmark Handbook for Database and Transaction Processing Systems. 2nd ed., Morgan Kaufmann Publishers, San Mateo 1993.
 
13
[Teradata] Teradata DBS Concepts and Facilities for the NCR System 3600, AT&T GIS, Dayton Ohio, Jan 1994.

CITED BY  11

Collaborative Colleagues:
Tom Barclay: colleagues
Robert Barnes: colleagues
Jim Gray: colleagues
Prakash Sundaresan: colleagues