ACM Home Page
Please provide us with feedback. Feedback
Evaluating the impact of simultaneous multithreading on network servers using real hardware
Full text PdfPdf (499 KB)
Source Joint International Conference on Measurement and Modeling of Computer Systems archive
Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems table of contents
Banff, Alberta, Canada
SESSION: Network & server performance evaluation table of contents
Pages: 315 - 326  
Year of Publication: 2005
ISBN:1-59593-022-1
Also published in ...
Authors
Yaoping Ruan  Princeton University, Princeton, NJ
Vivek S. Pai  Princeton University, Princeton, NJ
Erich Nahum  IBM T.J.Watson Research Center, Yorktown Heights, NY
John M. Tracey  IBM T.J.Watson Research Center, Yorktown Heights, NY
Sponsors
ACM: Association for Computing Machinery
SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 48,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1064212.1064254
What is a DOI?

ABSTRACT

This paper examines the performance of simultaneous multithreading (SMT) for network servers using actual hardware, multiple network server applications, and several workloads. Using three versions of the Intel Xeon processor with Hyper-Threading, we perform macroscopic analysis as well as microarchitectural measurements to understand the origins of the performance bottlenecks for SMT processors in these environments. The results of our evaluation suggest that the current SMT support in the Xeon is application and workload sensitive, and may not yield significant benefits for network servers.In general, we find that enabling SMT on real hardware usually produces only slight performance gains, and can sometimes lead to performance loss. In the uniprocessor case, previous studies appear to have neglected the OS overhead in switching from a uniprocessor kernel to an SMT-enabled kernel. The performance loss associated with such support is comparable to the gains provided by SMT. In the 2-way multiprocessor case, the higher number of memory references from SMT often causes the memory system to become the bottleneck, offsetting any processor utilization gains. This effect is compounded by the growing gap between processor speeds and memory latency. In trying to understand the large gains shown by simulation studies, we find that while the general trends for microarchitectural behavior agree with real hardware, differences in sizing assumptions and performance models yield much more optimistic benefits for SMT than we observe.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Apache Software Foundation. The Apache Web server. http://www. apache.org/.
 
3
P. Benmowski. Hyper-Threading Linux. LinuxWorld, Aug. 2003.
 
4
5
 
6
J. Bulpin and I. Pratt. Multiprogramming performance of the Pentium 4 with Hyper-Threading. In Workshop on Duplicating, Deconstructing, and Debunking (WDDD04), June 2004.
 
7
 
8
 
9
10
 
11
R. Kalla, B. Sinharoy, and J. M. Tendler. IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro, March 2004.
12
 
13
 
14
Linux Benchmark Suite Homepage. A GPL'd chat room benchmark. http://lbs.sourceforge.net/.
15
16
 
17
D. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty, J. A. Miller, and M. Upton. Hyper-threading technology architecture and microarchitecture. Intel Technology Journal, 6(1).
 
18
L. McDowell, S. Eggers, and S. Gribble. Improving server software support for simultaneous multithreaded processors.
 
19
L. McVoy and C. Staelin. lmbench: Portable tools for performance analysis. In USENIX 1996 Annual Technical Conference.
 
20
OProfile. A system profiler for Linux. http://oprofile.sourceforge.net/.
 
21
V. Pai, P. Druschel, and W. Zwaenepoel. Flash: An efficient and portable web server. In USENIX 1999 Annual Technical Conference.
22
 
23
G. Papadopoulos and D. Yen. Throughput computing: Driving down the cost of network computing. http://www.sun.com/events/analyst2003/presentations/Papadopoulos_Yen_WWAC_022503.pdf.
 
24
R. E. Hiromoto, O. M. Lubeck, and J. Moore. Experiences with the Denelcor HEP. In Parallel Computing.
 
25
26
 
27
 
28
Y. Ruan and V. Pai. Making the "Box" transparent: System call performance as a first-class result. In USENIX 2004 Annual Technical Conference, Boston, MA, June 2004.
 
29
U. Sigmund and T. Ungerer. Memory hierarchy studies of multimedia-enhanced simultaneous multithreaded processors for mpec-2 video decompression. In Workshop on MultiThreaded Execution, Architecture and Compilation, January 2000.
 
30
31
32
 
33
Standard Performance Evaluation Corporation. http://www.spec.org/benchmarks.html.
 
34
Standard Performance Evaluation Corporation. SPEC Web Benchmarks. http://www.spec.org/web99/ http://www.spec.org/web96.
 
35
36
37
 
38
TUX Web Server. http://www.tux.org/.
 
39
D. Vianney. Hyper-Threading speeds Linux. IBM developerWorks, Jan. 2003.
40


Collaborative Colleagues:
Yaoping Ruan: colleagues
Vivek S. Pai: colleagues
Erich Nahum: colleagues
John M. Tracey: colleagues