|
ABSTRACT
As improvements in processor performance continue to far outpace improvements in storage performance, I/O is increasingly the bottleneck in computer systems, especially in large database systems that manage huge amoungs of data. The key to achieving good I/O performance is to thoroughly understand its characteristics. In this article we present a comprehensive analysis of the logical I/O reference behavior of the peak productiondatabase workloads from ten of the world's largest corporations. In particular, we focus on how these workloads respond to different techniques for caching, prefetching, and write buffering. Our findings include several broadly applicable rules of thumb that describe how effective the various I/O optimization techniques are for the production workloads. For instance, our results indicate that the buffer pool miss ratio tends to be related to the ratio of buffer pool size to data size by an inverse square root rule. A similar fourth root rule relates the write miss ratio and the ration of buffer pool size to data size.
In addition, we characterize the reference characteristics of workloads similar to the Transaction Processing Performance Council (TPC) benchmarks C (TPC-C) and D(TPC-D), which are de facto standard performance measures for online transaction processing (OLTP) systems and decision support systems (DSS), respectively. Since benchmarks such as TPC-C and TPC-D can only be used effectively if their strengths and limitations are understood, a major focus of our analysis is to identify aspects of the benchmarks that stress the system differently than the production workloads. We discover that for the most part, the reference behavior of TPC-C and TPC-D fall within the range of behavior exhibited by the production workloads. However, there are some noteworthy exceptions that affect well-known I/O optimization techniques such as caching (LRU is further from the optimal for TPC-C, while there is little sharing of pages between transactions for TPC-D), prefetching (TPC-C exhibits no significant sequentiality), and write buffering (write buffering is lees effective for the TPC benchmarks). While the two TPC benchmarks generally complement one another in reflecting the characteristics of the production workloads, there remain aspects of the real workloads that are not represented by either of the benchmarks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
ATUL, C., DONALD,H.J.,SHIBAMIYA, A., LYLE,R.W.,AND WATTS, S. J. 1988. System and method for avoiding complete index traversals in sequential and almost sequential index probes. U.S. Patent 5748952. Filed May 10, 1995. Issued May 5, 1998.
|
| |
3
|
|
 |
4
|
Mary G. Baker , John H. Hartman , Michael D. Kupfer , Ken W. Shirriff , John K. Ousterhout, Measurements of a distributed file system, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.198-212, October 13-16, 1991, Pacific Grove, California, United States
|
| |
5
|
BELADY, L. A. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5, 2, 78-101.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
Peter M. Chen , Edward K. Lee , Garth A. Gibson , Randy H. Katz , David A. Patterson, RAID: high-performance, reliable secondary storage, ACM Computing Surveys (CSUR), v.26 n.2, p.145-185, June 1994
[doi> 10.1145/176979.176981]
|
| |
10
|
CHOU,H.T.AND DEWITT, D. J. 1985. An evaluation of buffer management strategies for relational database systems. In Proceedings of the International Conference on Very Large Data Bases ( VLDB) (Stockholm, Sweden, Aug. 1985), 127-141.
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
Joachen Doppelhammer , Thomas Höppler , Alfons Kemper , Donald Kossmann, Database performance in the real world: TPC-D and SAP R/3, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.123-134, May 11-15, 1997, Tucson, Arizona, United States
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
L. M. Haas , W. Chang , G. M. Lohman , J. McPherson , P. F. Wilms , G. Lapis , B. Lindsay , H. Pirahesh , M. J. Carey , E. Shekita, Starburst Mid-Flight: As the Dust Clears, IEEE Transactions on Knowledge and Data Engineering, v.2 n.1, p.143-160, March 1990
[doi> 10.1109/69.50910]
|
 |
22
|
|
| |
23
|
|
| |
24
|
HILL, A. V. 1913. The combinations of haemoglobin with oxygen and carbon monoxide. Biochemistry J. 7, 471-480.
|
| |
25
|
|
| |
26
|
HSU,W.W.,SMITH,A.J.,AND YOUNG, H. C. 1999b. Results and data for 'Analysis of the I /O characteristics of production database workloads and the TPC benchmarks'. http://www. cs.berkeley.edu /~windsorh/DBChar.
|
| |
27
|
|
| |
28
|
|
| |
29
|
IBM CORP. 1997a. DB2 for OS/390 V5 Installation Guide.
|
| |
30
|
IBM CORP. 1997b. DB2 UDB V5 Administration Guide.
|
| |
31
|
INTEL CORP. 1999. Intel extended server memory architecture (ESMA): Overcoming the 4 GB memory barrier. http://www.intel.com/procs/servers/pentiumiii/xeon/whitepapers/ESMA. htm.
|
| |
32
|
|
| |
33
|
KEARNS,J.P.AND DEFAZIO, S. 1983. Locality of reference in hierarchical database systems. IEEE Trans. Softw. Eng. 19, 2 (March), 128-134.
|
 |
34
|
|
| |
35
|
KING, W. F. 1971. Analysis of paging algorithms. In Proceedings of the IFIP Congress (Ljubljana, Yugoslavia, Aug. 1971), 485-490.
|
| |
36
|
|
 |
37
|
|
| |
38
|
MCNUTT, B. 1991. A simple statistical model of cache reference locality, and its application to cache planning, measurement and control. In Proceedings of the CMG (Computer Measurement Group) Conference (Nashville, TN, Dec. 1991), 203-210.
|
| |
39
|
MCNUTT, B. 1995. MVS DASD survey: Results and trends. In Proceedings of the CMG (Computer Measurement Group) Conference (Nashville, TN, Dec. 1995), 658-667.
|
| |
40
|
|
| |
41
|
MOGUL, J. C. 1994. A better update policy. In Proceedings of the Summer 1994 USENIX Conference (Boston, MA, June 1994), 99-111.
|
 |
42
|
|
 |
43
|
|
 |
44
|
Raymond Ng , Christos Faloutsos , Timos Sellis, Flexible buffer allocation based on marginal gains, Proceedings of the 1991 ACM SIGMOD international conference on Management of data, p.387-396, May 29-31, 1991, Denver, Colorado, United States
|
 |
45
|
Victor F. Nicola , Asit Dan , Daniel M. Dias, Analysis of the generalized clock buffer replacement scheme for database transaction processing, Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.35-46, June 01-05, 1992, Newport, Rhode Island, United States
|
 |
46
|
Elizabeth J. O'Neil , Patrick E. O'Neil , Gerhard Weikum, The LRU-K page replacement algorithm for database disk buffering, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.297-306, May 25-28, 1993, Washington, D.C., United States
|
 |
47
|
John K. Ousterhout , Hervé Da Costa , David Harrison , John A. Kunze , Mike Kupfer , James G. Thompson, A trace-driven analysis of the UNIX 4.2 BSD file system, Proceedings of the tenth ACM symposium on Operating systems principles, p.15-24, December 1985, Orcas Island, Washington, United States
|
| |
48
|
|
| |
49
|
RAGAZ,N.AND RODRIGUEZ-ROSELL, J. 1976. Empirical studies of storage management in a data base system. Res. Rep. RJ 1834, IBM Research Laboratory, San Jose, CA, Oct. 1976.
|
 |
50
|
K. K. Ramakrishnan , Prabuddha Biswas , Ramakrishna Karedla, Analysis of file I/O traces in commercial computing environments, Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.78-90, June 01-05, 1992, Newport, Rhode Island, United States
|
 |
51
|
|
| |
52
|
RODRIGUEZ-ROSELL, J. 1976. Empirical data reference behavior in data base systems. IEEE Computer 9, 11 (Nov.), 9-13.
|
 |
53
|
|
| |
54
|
|
 |
55
|
|
 |
56
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
57
|
SINGHAL,V.AND SMITH, A. J. 1997. Analysis of locking behavior in three real database systems. VLDB J. 6, 1 (Jan.), 40-52. Extended version available as Tech. Rep. CSD-94-801, Computer Science Div., Univ. of California, Berkeley, CA, Apr. 1994.
|
| |
58
|
SMITH, A. J. 1976. Analysis of the optimal, look-ahead demand paging algorithms. SIAM J. Comput. 5, 4 (Dec.), 743-757.
|
 |
59
|
|
 |
60
|
|
| |
61
|
SMITH, A. J. 1994. Trace driven simulation in research on computer architecture and operating systems. In Proceedings of the Conference on New Directions in Simulation for Manufacturing and Communications (Tokyo, Japan, Aug. 1994), 43-49.
|
 |
62
|
|
| |
63
|
TENG,J.Z.AND GUMAER, R. A. 1984. Managing IBM Database 2 buffers to maximize performance. IBM Syst. J. 23, 2, 211-218.
|
| |
64
|
|
| |
65
|
TPC. 1997a. TPC Benchmark TM C Standard Specification Revision 3.3.2. Transaction Processing Performance Council.
|
| |
66
|
TPC. 1997b. TPC Benchmark TM D Standard Specification Revision 1.3.1. Transaction Processing Performance Council.
|
| |
67
|
TPC. 1999a. TPC Benchmark TM H Standard Specification Revision 1.1.0. Transaction Processing Performance Council.
|
| |
68
|
TPC. 1999b. TPC Benchmark TM R Standard Specification Revision 1.0.1. Transaction Processing Performance Council.
|
 |
69
|
Thin-Fong Tsuei , Allan N. Packer , Keng-Tai Ko, Database buffer size investigation for OLTP workloads, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.112-122, May 11-15, 1997, Tucson, Arizona, United States
|
| |
70
|
TUEL,JR., W. G. 1976. An analysis of buffer paging in virtual storage systems. IBM J. Res. Dev. 20, 5 (Sept.), 518-520.
|
| |
71
|
TUEL,JR., W. G. AND RODRIGUEZ-ROSELL, J. 1975. A methodology for the evaluation of data base systems. Res. Rep. RJ 1668, IBM Research Laboratory, San Jose, CA, Oct. 1975.
|
 |
72
|
|
 |
73
|
|
| |
74
|
VISHLITZKY,N.AND OFEK, Y. 1988. Sequential cache management system utilizing the establishment of a microcache and managing the contents of such according to a threshold comparison. U.S. Patent 5706467. Filed Sep 5, 1995. Issued Jan 6, 1998.
|
| |
75
|
WELCH, B. B. 1991. Measured performance of caching in the Sprite network file system. Comput. Syst. 4, 3 (Summer), 315-342.
|
 |
76
|
|
| |
77
|
|
| |
78
|
ZHOU, S., DA COSTA, H., AND SMITH, A. J. 1985. A file system tracing package for Berkeley UNIX.In Proceedings of the 10th Usenix Conference (Portland, OR, June 1985), 407-419.
|
| |
79
|
ZIVKOV,B.T.AND SMITH, A. J. 1997. Disk cache design and performance as evaluated in large timesharing and database systems. In Proceedings of the CMG (Computer Measurement Group) Conference (Orlando, FL, Dec. 1997), 639-658.
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
Additional Classification:
D.
Software
D.4
OPERATING SYSTEMS
D.4.2
Storage Management
Subjects:
Storage hierarchies
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.4
Systems and Software
Subjects:
Performance evaluation (efficiency and effectiveness)
K.
Computing Milieux
K.6
MANAGEMENT OF COMPUTING AND INFORMATION SYSTEMS
K.6.2
Installation Management
Subjects:
Benchmarks
General Terms:
Algorithms,
Design,
Performance
Keywords:
I/O,
TPC benchmarks,
caching,
locality,
prefetching,
production database workloads,
reference behavior,
sequentiality,
workload characterization
|