| An analysis of database workload performance on simultaneous multithreaded processors |
| Full text |
Pdf
(1.57 MB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 25th annual international symposium on Computer architecture
table of contents
Barcelona, Spain
Pages: 39 - 50
Year of Publication: 1998
ISBN:0-8186-8491-7
Also published in ...
|
|
Authors
|
|
Jack L. Lo
|
Dept. of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA
|
|
Luiz André Barroso
|
Digital Equipment Corporation, Western Research Laboratory, 250 University Ave., Palo Alto, CA
|
|
Susan J. Eggers
|
Dept. of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA
|
|
Kourosh Gharachorloo
|
Digital Equipment Corporation, Western Research Laboratory, 250 University Ave., Palo Alto, CA
|
|
Henry M. Levy
|
Dept. of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA
|
|
Sujay S. Parekh
|
Dept. of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA
|
|
| Sponsors |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 50, Citation Count: 53
|
|
|
ABSTRACT
Simultaneous multithreading (SMT) is an architectural technique in which the processor issues multiple instructions from multiple threads each cycle. While SMT has been shown to be effective on scientific workloads, its performance on database systems is still an open question. In particular, database systems have poor cache performance, and the addition of multithreading has the potential to exacerbate cache conflicts.This paper examines database performance on SMT processors using traces of the Oracle database management system. Our research makes three contributions. First, it characterizes the memory-system behavior of database systems running on-line transaction processing and decision support system workloads. Our data show that while DBMS workloads have large memory footprints, there is substantial data reuse in a small, cacheable "critical" working set. Second, we show that the additional data cache conflicts caused by simultaneous multithreaded instruction scheduling can be nearly eliminated by the proper choice of software-directed policies for virtual-to-physical page mapping and per-process address offsetting. Our results demonstrate that with the best policy choices, D-cache miss rates on an 8-context SMT are roughly equivalent to those on a single-threaded superscalar. Multithreading also leads to better interthread instruction cache sharing, reducing I-cache miss rates by up to 35%. Third, we show that SMT's latency tolerance is highly effective for database applications. For example, using a memory-intensive OLTP workload, an 8-context SMT processor achieves a 3-fold increase in instruction throughput over a single-threaded superscalar with similar resources.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.1-14, October 05-08, 1997, Saint Malo, France
|
 |
2
|
|
 |
3
|
|
| |
4
|
Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm , Dean M. Tullsen, Simultaneous Multithreading: A Platform for Next-Generation Processors, IEEE Micro, v.17 n.5, p.12-19, September 1997
[doi> 10.1109/40.621209]
|
 |
5
|
Richard J. Eickemeyer , Ross E. Johnson , Steven R. Kunkel , Mark S. Squillante , Shiafun Liu, Evaluation of multithreaded uniprocessors for commercial application environments, Proceedings of the 23rd annual international symposium on Computer architecture, p.203-212, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
6
|
|
| |
7
|
V. Gokhale. Design of the 64-bit option Ior the Oracle7 relational database management system. Digital Technical Journal, 8(4):76-82, 1996.
|
| |
8
|
|
 |
9
|
|
| |
10
|
Jack L. Lo , Susan J. Eggers , Henry M. Levy , Sujay S. Parekh , Dean M. Tullsen, Tuning compiler optimizations for simultaneous multithreading, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.114-124, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
11
|
|
 |
12
|
Ann Marie Grizzaffi Maynard , Colette M. Donnelly , Bret R. Olszewski, Contrasting characteristics and cache performance of technical and multi-user commercial workloads, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.145-156, October 05-07, 1994, San Jose, California, United States
|
| |
13
|
S. McFarling. Combining branch predictors. Technical Report TN-36, DEC-WRL, June 1993.
|
 |
14
|
|
| |
15
|
|
 |
16
|
M. Rosenblum , E. Bugnion , S. A. Herrod , E. Witchel , A. Gupta, The impact of architectural trends on operating system performance, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.285-298, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
17
|
|
 |
18
|
|
 |
19
|
Josep Torrellas , Anoop Gupta , John Hennessy, Characterizing the caching and synchronization performance of a multiprocessor operating system, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.162-174, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
20
|
Transaction Processing Performance Council. TPC Benchmark B Standard Specification Revision 2.0. June 1994.
|
| |
21
|
Transaction Processing Performance Council. TPC Benchmark D (Decision Support) Standard Specification Revision 1.2. November 1996.
|
| |
22
|
|
 |
23
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
24
|
Ben Verghese , Scott Devine , Anoop Gupta , Mendel Rosenblum, Operating system support for improving data locality on CC-NUMA compute servers, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.279-289, October 01-04, 1996, Cambridge, Massachusetts, United States
|
CITED BY 53
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alex Ramirez , Luiz André Barroso , Kourosh Gharachorloo , Robert Cohn , Josep Larriba-Pey , P. Geoffrey Lowney , Mateo Valero, Code layout optimizations for transaction processing workloads, ACM SIGARCH Computer Architecture News, v.29 n.2, p.155-164, May 2001
|
|
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, ACM SIGARCH Computer Architecture News, v.28 n.2, p.282-293, May 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Milo M. K. Martin , Daniel J. Sorin , Harold W. Cain , Mark D. Hill , Mikko H. Lipasti, Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on a modern processor, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Minglong Shao , Anastassia Ailamaki , Babak Falsafi, DBmbench: fast and accurate database workload representation on modern microarchitecture, Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, p.254-267, October 17-20, 2005, Toranto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on modern and emerging processors, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.1, p.77-96, January 2007
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, A characterization of data mining algorithms on a modern processor, Proceedings of the 1st international workshop on Data management on new hardware, June 12-12, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
Richard A. Hankins , Trung Diep , Murali Annavaram , Brian Hirano , Harald Eri , Hubert Nueckel , John P. Shen, Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.151, December 03-05, 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Layali Rashid , Wessam M. Hassanein , Moustafa A. Hammad, Exploiting multithreaded architectures to improve the hash join operation, Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture, p.46-53, October 26-26, 2008, Toronto, Canada
|
|
|
|
|
|
|
|