| Quickly generating billion-record synthetic databases |
| Full text |
Pdf
(1.11 MB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 1994 ACM SIGMOD international conference on Management of data
table of contents
Minneapolis, Minnesota, United States
Pages: 243 - 252
Year of Publication: 1994
ISBN:0-89791-639-5
Also published in ...
|
|
Authors
|
|
Jim Gray
|
Digital Equipment Corporation, 455 Market, San Francisco, CA
|
|
Prakash Sundaresan
|
Digital Equipment Corporation, 455 Market, San Francisco, CA
|
|
Susanne Englert
|
Tandem Computers Inc., 19333 Vallco Parkway, Cupertino, CA
|
|
Ken Baclawski
|
Computer Science, Northeastern University, 360 Huntington Av. Boston, MA
|
|
Peter J. Weinberger
|
Bell Laboratories, 600 Mountain Ave, Murry Hill, NJ
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 75, Citation Count: 49
|
|
|
ABSTRACT
Evaluating database system performance often requires generating synthetic databases—ones having certain statistical properties but filled with dummy information. When evaluating different database designs, it is often necessary to generate several databases and evaluate each design. As database sizes grow to terabytes, generation often takes longer than evaluation. This paper presents several database generation techniques. In particular it discusses: (1) Parallelism to get generation speedup and scaleup. (2) Congruential generators to get dense unique uniform distributions. (3) Special-case discrete logarithms to generate indices concurrent to the base table generation. (4) Modification of (2) to get exponential, normal, and self-similar distributions.The discussion is in terms of generating billion-record SQL databases using C programs running on a shared-nothing computer system consisting of a hundred processors, with a thousand discs. The ideas apply to smaller databases, but large databases present the more difficult problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
Bitton 1
|
Bitton, D., DeWitt. D, Turbyfill, C., Source code for Wisconsin Database Generator distributed on the "Wisconsin Benchmark 'rape", Computer Science, U. Wisconsin, Madison, wi. i984
|
| |
Bitton 2
|
|
| |
Coppersmith
|
|
| |
DeWitt 1
|
David J. DeWitt , Robert H. Gerber , Goetz Graefe , Michael L. Heytens , Krishna B. Kumar , M. Muralikrishna, GAMMA - A High Performance Dataflow Database Machine, Proceedings of the 12th International Conference on Very Large Data Bases, p.228-237, August 25-28, 1986
|
| |
DeWitt 2
|
D. J. Dewitt , S. Ghandeharizadeh , D. A. Schneider , A. Bricker , H. -I. Hsiao , R. Rasmussen, The Gamma Database Machine Project, IEEE Transactions on Knowledge and Data Engineering, v.2 n.1, p.44-62, March 1990
[doi> 10.1109/69.50905]
|
| |
DeWitt 3
|
|
 |
Englert
|
Susanne Englert , Jim Gray , Terrye Kocher , Praful Shah, A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases, Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.245-246, April 1990, Univ. of Colorado, Boulder, Colorado, United States
|
| |
Gerber
|
|
| |
Hobbs
|
|
| |
Horst
|
I-Iorst, R., Chou, T., "The Hardware Architecture and Linear Expansion of Tandem NonStop Systems", Proc. 12th Int. Conf. Computer Architecture, June 1985.
|
| |
Jain
|
Jain, R., The Art of Computer Systems Performance Analysis, John Wiley & Sons, New York, 1991
|
| |
Kim
|
|
| |
Knuth
|
|
| |
Kronenberg
|
Kronenberg, N., H. Levey, W. Streeker and R. Merewood. "The VAXcluster Concept; An Overview of a Distributed System." Digital Technical Journal. 1(3): jan. 1987, pp. 7-21.
|
| |
Nyberg
|
Nyberg, C., Barclay, T., Gray, J., Lomet, D., "AlphaSort - A High-Speed Sort for RISC Machines" Proc 1994 ACM SmMOD, 1994.
|
| |
Press
|
|
| |
Ripley
|
|
 |
Schrage
|
|
| |
Smith
|
Marc Smith , William Alexander , Haran Boral , George P. Copeland , Tom Keller , Herbert D. Schwetman , Chii-Ren Young, An Experiment on Response Time Scalability in Bubba, Proceedings of the Sixth International Workshop on Database Machines, p.34-57, June 19-21, 1989
|
| |
Stonebraker
|
Stonebraker, M., "The Case for Shared-Nothing", Database Engineering, V. 9(1), Jan. 1986.
|
| |
Tanenbaum
|
|
| |
Teradata
|
"The Genesis of a Database Computer: A Conversation with Jack Shemer and Phil Neches of Teradata Corporation", IEEE Computer, Nov. 1984. or DBC/IO12 Database Computer System Manual, Release 1.3, C10-0001-01, Teradata Corp., Los Angeles, Feb. 1985.
|
 |
Thekkath
|
|
| |
TPC
|
"Transaction Processing Performance Council Benchmark A", Chapter 3 of Performance Handbook for Database and 'l'ransact#on Processing Systems, Morgan Kaufmann, San Mateo, 1993.
|
| |
Uren
|
Uren, S., "Message System Performance Tests", Tandem Systems Review, V3.4, pp. 27-32, Dec. 1986.
|
CITED BY 49
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonidas Galanis , Supiti Buranawatanachoke , Romain Colle , Benoît Dageville , Karl Dias , Jonathan Klein , Stratos Papadomanolakis , Leng Leng Tan , Venkateshwaran Venkataramani , Yujun Wang , Graham Wood, Oracle database replay, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Lihua Ran , Curtis Dyreson , Anneliese Andrews , Renée Bryce , Christopher Mallery, Building test cases and oracles to automate the testing of web database applications, Information and Software Technology, v.51 n.2, p.460-477, February, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|