ACM Home Page
Please provide us with feedback. Feedback
Quickly generating billion-record synthetic databases
Full text PdfPdf (1.11 MB)
Source International Conference on Management of Data archive
Proceedings of the 1994 ACM SIGMOD international conference on Management of data table of contents
Minneapolis, Minnesota, United States
Pages: 243 - 252  
Year of Publication: 1994
ISBN:0-89791-639-5
Also published in ...
Authors
Jim Gray  Digital Equipment Corporation, 455 Market, San Francisco, CA
Prakash Sundaresan  Digital Equipment Corporation, 455 Market, San Francisco, CA
Susanne Englert  Tandem Computers Inc., 19333 Vallco Parkway, Cupertino, CA
Ken Baclawski  Computer Science, Northeastern University, 360 Huntington Av. Boston, MA
Peter J. Weinberger  Bell Laboratories, 600 Mountain Ave, Murry Hill, NJ
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 75,   Citation Count: 49
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/191839.191886
What is a DOI?

ABSTRACT

Evaluating database system performance often requires generating synthetic databases—ones having certain statistical properties but filled with dummy information. When evaluating different database designs, it is often necessary to generate several databases and evaluate each design. As database sizes grow to terabytes, generation often takes longer than evaluation. This paper presents several database generation techniques. In particular it discusses: (1) Parallelism to get generation speedup and scaleup. (2) Congruential generators to get dense unique uniform distributions. (3) Special-case discrete logarithms to generate indices concurrent to the base table generation. (4) Modification of (2) to get exponential, normal, and self-similar distributions.The discussion is in terms of generating billion-record SQL databases using C programs running on a shared-nothing computer system consisting of a hundred processors, with a thousand discs. The ideas apply to smaller databases, but large databases present the more difficult problems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
Bitton 1
Bitton, D., DeWitt. D, Turbyfill, C., Source code for Wisconsin Database Generator distributed on the "Wisconsin Benchmark 'rape", Computer Science, U. Wisconsin, Madison, wi. i984
 
Bitton 2
 
Coppersmith
 
DeWitt 1
 
DeWitt 2
 
DeWitt 3
Englert
 
Gerber
 
Hobbs
 
Horst
I-Iorst, R., Chou, T., "The Hardware Architecture and Linear Expansion of Tandem NonStop Systems", Proc. 12th Int. Conf. Computer Architecture, June 1985.
 
Jain
Jain, R., The Art of Computer Systems Performance Analysis, John Wiley & Sons, New York, 1991
 
Kim
 
Knuth
 
Kronenberg
Kronenberg, N., H. Levey, W. Streeker and R. Merewood. "The VAXcluster Concept; An Overview of a Distributed System." Digital Technical Journal. 1(3): jan. 1987, pp. 7-21.
 
Nyberg
Nyberg, C., Barclay, T., Gray, J., Lomet, D., "AlphaSort - A High-Speed Sort for RISC Machines" Proc 1994 ACM SmMOD, 1994.
 
Press
 
Ripley
Schrage
 
Smith
 
Stonebraker
Stonebraker, M., "The Case for Shared-Nothing", Database Engineering, V. 9(1), Jan. 1986.
 
Tanenbaum
 
Teradata
"The Genesis of a Database Computer: A Conversation with Jack Shemer and Phil Neches of Teradata Corporation", IEEE Computer, Nov. 1984. or DBC/IO12 Database Computer System Manual, Release 1.3, C10-0001-01, Teradata Corp., Los Angeles, Feb. 1985.
Thekkath
 
TPC
"Transaction Processing Performance Council Benchmark A", Chapter 3 of Performance Handbook for Database and 'l'ransact#on Processing Systems, Morgan Kaufmann, San Mateo, 1993.
 
Uren
Uren, S., "Message System Performance Tests", Tandem Systems Review, V3.4, pp. 27-32, Dec. 1986.

CITED BY  49

Collaborative Colleagues:
Jim Gray: colleagues
Prakash Sundaresan: colleagues
Susanne Englert: colleagues
Ken Baclawski: colleagues
Peter J. Weinberger: colleagues