| Loading databases using dataflow parallelism |
| Full text |
Pdf
(1.49 MB)
|
| Source
|
ACM SIGMOD Record
archive
Volume 23 , Issue 4 (December 1994)
table of contents
Pages: 72 - 83
Year of Publication: 1994
ISSN:0163-5808
|
|
Authors
|
|
Tom Barclay
|
Digital Equipment Corporation, San Francisco Systems Center, Microsoft, One Microsoft Way, Redmond, WA
|
|
Robert Barnes
|
Digital Equipment Corporation, San Francisco Systems Center, Microsoft, One Microsoft Way, Redmond, WA
|
|
Jim Gray
|
Digital Equipment Corporation, San Francisco Systems Center, 310 Filbert St., S.F., CA
|
|
Prakash Sundaresan
|
Digital Equipment Corporation, San Francisco Systems Center, Informix, 921 SW Washington St. # 670, Portland, OR
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 31, Citation Count: 11
|
|
|
ABSTRACT
This paper describes a parallel database load prototype for Digital's Rdb database product. The prototype takes a dataflow approach to database parallelism. It includes an explorer that discovers and records the cluster configuration in a database, a client CUI interface that gathers the load job description from the user and from the Rdb catalogs, and an optimizer that picks the best parallel execution plan and records it in a web data structure. The web describes the data operators, the dataflow rivers among them, the binding of operators to processes, processes to processors, and files to discs and tapes. This paper describes the optimizer's cost-based hierarchical optimization strategy in some detail. The prototype executes the web's plan by spawning a web manager process at each node of the cluster. The managers create the local executor processes, and orchestrate startup, phasing, checkpoint, and shutdown. The execution processes perform one or more operators. Data flows among the operators are via memory-to-memory streams within a node, and via web-manager multiplexed tcp/ip streams among nodes. The design of the transaction and checkpoint/restart mechanisms are also described. Preliminary measurements indicate that this design will give excellent scaleups.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Mark Heinrich , Jeffrey Kuskin , David Ofelt , John Heinlein , Joel Baxter , Jaswinder Pal Singh , Richard Simoni , Kourosh Gharachorloo , David Nakahira , Mark Horowitz , Anoop Gupta , Mendel Rosenblum , John Hennessy, The performance impact of flexibility in the Stanford FLASH multiprocessor, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.274-285, October 05-07, 1994, San Jose, California, United States
|
| |
2
|
David J. DeWitt , Robert H. Gerber , Goetz Graefe , Michael L. Heytens , Krishna B. Kumar , M. Muralikrishna, GAMMA - A High Performance Dataflow Database Machine, Proceedings of the 12th International Conference on Very Large Data Bases, p.228-237, August 25-28, 1986
|
| |
3
|
[DeWitt 2] D. DeWitt, "The Wisconsin Benchmark, Past, Present, and Future", in The Benchmark Handbook for Database and Transaction Processing Systems. 2nd ed., Morgan Kaufmann, San Mateo 1993.
|
| |
4
|
[Englert] S. Englert, "Performance Benefits of Parallel Query Execution and Mixed Workload Support in NonStop SQL Release 2", Tandem Systems Review, V.6.2, Oct 1990, pp. 12-23.
|
| |
5
|
[Garey & Johnson] M.R. Garey, D.S. Johnson, Computers and Intractability, W.H. Freeman, 1979.
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
[Hong] W. Hong, Parallel Query Processing Using Shared Memory Multiprocessors and Disk Arrays, Ph.D. Thesis, U.C. Berkeley, 1992.
|
| |
10
|
[Kitsuregawa 1] M. Kitsuregawa, H. Tanaka, T. Moto-ka, "Application of Hash to Database Machine and Its application," New Generation Computing, 1, 1 pp. 63-74, Springer Verlag, 1983.
|
| |
11
|
|
| |
12
|
[Serlin] O. Serlin, "The History of the TPC", in The Benchmark Handbook for Database and Transaction Processing Systems. 2nd ed., Morgan Kaufmann Publishers, San Mateo 1993.
|
| |
13
|
[Teradata] Teradata DBS Concepts and Facilities for the NCR System 3600, AT&T GIS, Dayton Ohio, Jan 1994.
|
CITED BY 11
|
|
Alexander S. Szalay , Peter Z. Kunszt , Ani Thakar , Jim Gray , Don Slutz , Robert J. Brunner, Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey, ACM SIGMOD Record, v.29 n.2, p.451-462, June 2000
|
|
|
Remzi H. Arpaci-Dusseau , Eric Anderson , Noah Treuhaft , David E. Culler , Joseph M. Hellerstein , David Patterson , Kathy Yelick, Cluster I/O with River: making the fast case common, Proceedings of the sixth workshop on I/O in parallel and distributed systems, p.10-22, May 05-05, 1999, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
|
|
|
Philip Buonadonna , Joshua Coates , Spencer Low , David E. Culler, Millennium sort: a cluster-based application for windows NT using DCOM, river primitives and the virtual interface architecture, Proceedings of the 3rd conference on USENIX Windows NT Symposium, p.9-9, July 12-15, 1999, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|