|
ABSTRACT
Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized to perform well on scientific and engineering workloads. Given the radically different behavior of database workloads (especially OLTP), it is important to re-evaluate key system design decisions in the context of this important class of applications.This paper examines the behavior of database workloads on shared-memory multiprocessors with aggressive out-of-order processors, and considers simple optimizations that can provide further performance improvements. Our study is based on detailed simulations of the Oracle commercial database engine. The results show that the combination of out-of-order execution and multiple instruction issue is indeed effective in improving performance of database workloads, providing gains of 1.5 and 2.6 times over an in-order single-issue processor for OLTP and DSS, respectively. In addition, speculative techniques enable optimized implementations of memory consistency models that significantly improve the performance of stricter consistency models, bringing the performance to within 10--15% of the performance of more relaxed models.The second part of our study focuses on the more challenging OLTP workload. We show that an instruction stream buffer is effective in reducing the remaining instruction stalls in OLTP, providing a 17% reduction in execution time (approaching a perfect instruction cache to within 15%). Furthermore, our characterization shows that a large fraction of the data communication misses in OLTP exhibit migratory behavior; our preliminary results show that software prefetch and writeback/flush hints can be used for this data to further reduce execution time by 12%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
Z. Cvetanovic and D. D. Donaldson. AlphaServer 4100 performance characterization. Digital Technical Journal, 8(4):3-20, 1996.
|
 |
6
|
Richard J. Eickemeyer , Ross E. Johnson , Steven R. Kunkel , Mark S. Squillante , Shiafun Liu, Evaluation of multithreaded uniprocessors for commercial application environments, Proceedings of the 23rd annual international symposium on Computer architecture, p.203-212, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
7
|
K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models, in Proceedings of the 1991 International Conference on Parallel Processing, pages 1:355-364, August 1991.
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
Kimberly Keeton , David A. Patterson , Yong Qiang He , Roger C. Raphael , Walter E. Baker, Performance characterization of a Quad Pentium Pro SMP using OLTP workloads, Proceedings of the 25th annual international symposium on Computer architecture, p.15-26, June 27-July 02, 1998, Barcelona, Spain
|
| |
12
|
|
 |
13
|
Jack L. Lo , Luiz André Barroso , Susan J. Eggers , Kourosh Gharachorloo , Henry M. Levy , Sujay S. Parekh, An analysis of database workload performance on simultaneous multithreaded processors, Proceedings of the 25th annual international symposium on Computer architecture, p.39-50, June 27-July 02, 1998, Barcelona, Spain
|
 |
14
|
Ann Marie Grizzaffi Maynard , Colette M. Donnelly , Bret R. Olszewski, Contrasting characteristics and cache performance of technical and multi-user commercial workloads, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.145-156, October 05-07, 1994, San Jose, California, United States
|
| |
15
|
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. In IEEE Technical Committee on Computer Architecture Newsletter, Dec 1995.
|
 |
16
|
Basem A. Nayfeh , Lance Hammond , Kunle Olukotun, Evaluation of design alternatives for a multiprocessor microprocessor, Proceedings of the 23rd annual international symposium on Computer architecture, p.67-77, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
17
|
V. S. Pal, P. Ranganathan, and S. V. Adve. RSIM Reference Manual version 1.0. Technical Report 9705, Department of Electrical and Computer e University, August 1997.
|
| |
18
|
|
 |
19
|
Vijay S. Pai , Parthasarathy Ranganathan , Sarita V. Adve , Tracy Harton, An evaluation of memory consistency models for shared-memory systems with ILP processors, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.12-23, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
20
|
|
 |
21
|
M. Rosenblum , E. Bugnion , S. A. Herrod , E. Witchel , A. Gupta, The impact of architectural trends on operating system performance, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.285-298, December 03-06, 1995, Copper Mountain, Colorado, United States
|
| |
22
|
|
 |
23
|
|
| |
24
|
Standard Performance Council. The SPEC95 CPU Benchmark Suite. http://www.specbench.org, 1995.
|
 |
25
|
Per Stenström , Mats Brorsson , Lars Sandberg, An adaptive cache coherence protocol optimized for migratory sharing, Proceedings of the 20th annual international symposium on Computer architecture, p.109-118, May 16-19, 1993, San Diego, California, United States
|
 |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
Transaction Processing Performance Council. TPC Benchmark B (Online Transaction Processing) Standard Specification, 1990.
|
| |
30
|
Transaction Processing Performance Council. TPC Benchmark D (Decision Support) Standard Specification, Dec 1995.
|
 |
31
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
CITED BY 43
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alex Ramirez , Luiz André Barroso , Kourosh Gharachorloo , Robert Cohn , Josep Larriba-Pey , P. Geoffrey Lowney , Mateo Valero, Code layout optimizations for transaction processing workloads, ACM SIGARCH Computer Architecture News, v.29 n.2, p.155-164, May 2001
|
|
|
|
|
|
|
|
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, ACM SIGARCH Computer Architecture News, v.28 n.2, p.282-293, May 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Minglong Shao , Anastassia Ailamaki , Babak Falsafi, DBmbench: fast and accurate database workload representation on modern microarchitecture, Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, p.254-267, October 17-20, 2005, Toranto, Ontario, Canada
|
|
|
Murali Annavaram , Ryan Rakvic , Marzia Polito , Jean-Yves Bouguet , Richard A. Hankins , Bob Davies, The Fuzzy Correlation between Code and Performance Predictability, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.93-104, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas R. Puzak , A. Hartstein , P. G. Emma , V. Srinivasan , Jim Mitchell, An analysis of the effects of miss clustering on the cost of a cache miss, Proceedings of the 4th international conference on Computing frontiers, May 07-09, 2007, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas F. Wenisch , Stephen Somogyi , Nikolaos Hardavellas , Jangwoo Kim , Anastassia Ailamaki , Babak Falsafi, Temporal Streaming of Shared Memory, ACM SIGARCH Computer Architecture News, v.33 n.2, p.222-233, May 2005
|
|
|
|
|
|
Richard A. Hankins , Trung Diep , Murali Annavaram , Brian Hirano , Harald Eri , Hubert Nueckel , John P. Shen, Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.151, December 03-05, 2003
|
|
|
|
|
|
Ryan Johnson , Stavros Harizopoulos , Nikos Hardavellas , Kivanc Sabirli , Ippokratis Pandis , Anastasia Ailamaki , Naju G. Mancheril , Babak Falsafi, To share or not to share?, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
Thomas R. Puzak , A. Hartstein , P. G. Emma , V. Srinivasan , Arthur Nadas, Pipeline spectroscopy, Proceedings of the 2007 workshop on Experimental computer science, p.15-es, June 13-14, 2007, San Diego, California
|
|
|
|
|
|
Ryan Johnson , Ippokratis Pandis , Nikos Hardavellas , Anastasia Ailamaki , Babak Falsafi, Shore-MT: a scalable storage manager for the multicore era, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|