|
ABSTRACT
This paper examines the performance of simultaneous multithreading (SMT) for network servers using actual hardware, multiple network server applications, and several workloads. Using three versions of the Intel Xeon processor with Hyper-Threading, we perform macroscopic analysis as well as microarchitectural measurements to understand the origins of the performance bottlenecks for SMT processors in these environments. The results of our evaluation suggest that the current SMT support in the Xeon is application and workload sensitive, and may not yield significant benefits for network servers.In general, we find that enabling SMT on real hardware usually produces only slight performance gains, and can sometimes lead to performance loss. In the uniprocessor case, previous studies appear to have neglected the OS overhead in switching from a uniprocessor kernel to an SMT-enabled kernel. The performance loss associated with such support is comparable to the gains provided by SMT. In the 2-way multiprocessor case, the higher number of memory references from SMT often causes the memory system to become the bottleneck, offsetting any processor utilization gains. This effect is compounded by the growing gap between processor speeds and memory latency. In trying to understand the large gains shown by simulation studies, we find that while the general trends for microarchitectural behavior agree with real hardware, differences in sizing assumptions and performance models yield much more optimistic benefits for SMT than we observe.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.1-14, October 05-08, 1997, Saint Malo, France
|
| |
2
|
Apache Software Foundation. The Apache Web server. http://www. apache.org/.
|
| |
3
|
P. Benmowski. Hyper-Threading Linux. LinuxWorld, Aug. 2003.
|
| |
4
|
|
 |
5
|
|
| |
6
|
J. Bulpin and I. Pratt. Multiprogramming performance of the Pentium 4 with Hyper-Threading. In Workshop on Duplicating, Deconstructing, and Debunking (WDDD04), June 2004.
|
| |
7
|
|
| |
8
|
Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm , Dean M. Tullsen, Simultaneous Multithreading: A Platform for Next-Generation Processors, IEEE Micro, v.17 n.5, p.12-19, September 1997
[doi> 10.1109/40.621209]
|
| |
9
|
|
 |
10
|
Hiroaki Hirata , Kozo Kimura , Satoshi Nagamine , Yoshiyuki Mochizuki , Akio Nishimura , Yoshimori Nakase , Teiji Nishizawa, An elementary processor architecture with simultaneous instruction issuing from multiple threads, Proceedings of the 19th annual international symposium on Computer architecture, p.136-145, May 19-21, 1992, Queensland, Australia
|
| |
11
|
R. Kalla, B. Sinharoy, and J. M. Tendler. IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro, March 2004.
|
 |
12
|
Kimberly Keeton , David A. Patterson , Yong Qiang He , Roger C. Raphael , Walter E. Baker, Performance characterization of a Quad Pentium Pro SMP using OLTP workloads, Proceedings of the 25th annual international symposium on Computer architecture, p.15-26, June 27-July 02, 1998, Barcelona, Spain
|
| |
13
|
|
| |
14
|
Linux Benchmark Suite Homepage. A GPL'd chat room benchmark. http://lbs.sourceforge.net/.
|
 |
15
|
Jack L. Lo , Luiz André Barroso , Susan J. Eggers , Kourosh Gharachorloo , Henry M. Levy , Sujay S. Parekh, An analysis of database workload performance on simultaneous multithreaded processors, Proceedings of the 25th annual international symposium on Computer architecture, p.39-50, June 27-July 02, 1998, Barcelona, Spain
|
 |
16
|
Jack L. Lo , Joel S. Emer , Henry M. Levy , Rebecca L. Stamm , Dean M. Tullsen , S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems (TOCS), v.15 n.3, p.322-354, Aug. 1997
[doi> 10.1145/263326.263382]
|
| |
17
|
D. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty, J. A. Miller, and M. Upton. Hyper-threading technology architecture and microarchitecture. Intel Technology Journal, 6(1).
|
| |
18
|
L. McDowell, S. Eggers, and S. Gribble. Improving server software support for simultaneous multithreaded processors.
|
| |
19
|
L. McVoy and C. Staelin. lmbench: Portable tools for performance analysis. In USENIX 1996 Annual Technical Conference.
|
| |
20
|
OProfile. A system profiler for Linux. http://oprofile.sourceforge.net/.
|
| |
21
|
V. Pai, P. Druschel, and W. Zwaenepoel. Flash: An efficient and portable web server. In USENIX 1999 Annual Technical Conference.
|
 |
22
|
|
| |
23
|
G. Papadopoulos and D. Yen. Throughput computing: Driving down the cost of network computing. http://www.sun.com/events/analyst2003/presentations/Papadopoulos_Yen_WWAC_022503.pdf.
|
| |
24
|
R. E. Hiromoto, O. M. Lubeck, and J. Moore. Experiences with the Denelcor HEP. In Parallel Computing.
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
Y. Ruan and V. Pai. Making the "Box" transparent: System call performance as a first-class result. In USENIX 2004 Annual Technical Conference, Boston, MA, June 2004.
|
| |
29
|
U. Sigmund and T. Ungerer. Memory hierarchy studies of multimedia-enhanced simultaneous multithreaded processors for mpec-2 video decompression. In Workshop on MultiThreaded Execution, Architecture and Compilation, January 2000.
|
| |
30
|
Allan Snavely , Larry Carter , Jay Boisseau , Amit Majumdar , Kang Su Gatlin , Nick Mitchell , John Feo , Brian Koblenz, Multi-processor performance on the Tera MTA, Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), p.1-8, November 07-13, 1998, San Jose, CA
|
 |
31
|
|
 |
32
|
|
| |
33
|
Standard Performance Evaluation Corporation. http://www.spec.org/benchmarks.html.
|
| |
34
|
Standard Performance Evaluation Corporation. SPEC Web Benchmarks. http://www.spec.org/web99/ http://www.spec.org/web96.
|
| |
35
|
|
 |
36
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
37
|
|
| |
38
|
TUX Web Server. http://www.tux.org/.
|
| |
39
|
D. Vianney. Hyper-Threading speeds Linux. IBM developerWorks, Jan. 2003.
|
 |
40
|
Matt Welsh , David Culler , Eric Brewer, SEDA: an architecture for well-conditioned, scalable internet services, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
CITED BY 3
|
|
|
|
|
|
|
|
Ryan Johnson , Ippokratis Pandis , Nikos Hardavellas , Anastasia Ailamaki , Babak Falsafi, Shore-MT: a scalable storage manager for the multicore era, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|