ACM Home Page
Please provide us with feedback. Feedback
Mitigating Amdahl's Law through EPI Throttling
Full text PdfPdf (203 KB)
Source ACM SIGARCH Computer Architecture News archive
Volume 33 ,  Issue 2  (May 2005) table of contents
Pages: 298 - 309  
Year of Publication: 2005
ISSN:0163-5964
Also published in ...
Authors
Murali Annavaram  Intel Corporation
Ed Grochowski  Intel Corporation
John Shen  Intel Corporation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 76,   Citation Count: 20
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1080695.1069995
What is a DOI?

ABSTRACT

This paper is motivated by three recent trends in computer design. First, chip multi-processors (CMPs) with increasing numbers of CPU cores per chip are becoming common. Second, multi-threaded software that can take advantage of CMPs will soon become prevalent. Due to the nature of the algorithms, these multi-threaded programs inherently will have phases of sequential execution; Amdahlýs law dictates that the speedup of such parallel programs will be limited by the sequential portion of the computation. Finally, increasing levels of on-chip integration coupled with a slowing rate of reduction in supply voltage make power consumption a first order design constraint. Given this environment, our goal is to minimize the execution times of multi-threaded programs containing nontrivial parallel and sequential phases, while keeping the CMPýs total power consumption within a fixed budget. In order to mitigate the effects of Amdahlýs law, in this paper we make a compelling case for varying the amount of energy expended to process instructions according to the amount of available parallelism. Using the equation, Power=Energy per instruction (EPI) * Instructions per second (IPS), we propose that during phases of limited parallelism (low IPS) the chip multi-processor will spend more EPI; similarly, during phases of higher parallelism (high IPS) the chip multi-processor will spend less EPI; in both scenarios power is fixed. We evaluate the performance benefits of an EPI throttle on an asymmetric multiprocessor (AMP) prototyped from a physical 4-way Xeon SMP server. Using a wide range of multi-threaded programs, we show a 38% wall clock speedup on an AMP compared to a standard SMP that uses the same power. We also measure the supply current on a 4-way SMP server while running the multi-threaded programs and use the measured data as input to a software simulator that implements a more flexible EPI throttle. The results from the measurement-driven simulation show performance benefits comparable to the AMP prototype. We analyze the results from both techniques, explain why and when an EPI throttle works well, and conclude with a discussion of the challenges in building practical EPI throttles.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
[2] S. F. Altschul, W. Gish, W. Miller, E. W. Myers and D. J. Lipman. Basic local alignment search tool. In Journal of Molecular Biology, vol. 215, pages 403- 410, 1990.
 
3
4
 
5
 
6
 
7
[7] FFTW: http://www.fftw.org
 
8
[8] R. J. O. Figueiredo and J. A. B. Fortes. Impact of heterogeneity on DSM performance. In Proceedings Sixth International Symposium on High-Performance Computer Architecture, pages 26-38, January 2000.
 
9
 
10
[10] S. H. Gunther, F. Binns, D. M. Carmean, J. C. Hall. Managing the Impact of Increasing Microprocessor Power Consumption. Intel Technology Journal, First Quarter 2001. http://www.intel.com/technology/itj/q12001.htm
 
11
[11] L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Willey, M. Chen, M. Kozyrczak, and K. Olukotun. The Stanford Hydra CMP. Hot Chips 11, August 1999.
 
12
[12] HMMER: http://hmmer.wustl.edu
 
13
[13] Intel® Pentium® 4 Processor in the 423-pin Package at 1.30 GHz, 1.40 GHz, 1.50 GHz, 1.60 GHz, 1.70 GHz and 1.80 GHz Datasheet,. http://support.intel.com/design/pentium4/datashts/24 9198.htm, pages 78-79, 2001.
 
14
 
15
[15] J. Kahle. Power4: A Dual-CPU Processor Chip. Microprocessor Forum '99, October 1999.
 
16
17
 
18
[18] J. Li and J. F. Martínez. Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors. To appear in Proceedings of the International Symposium on Performance Analysis of Systems and Software, March. 2005.
19
 
20
 
21
[21] T. Y. Morad, U. Weiser and A. Kolodny. ACCMP - Asymmetric Chip Multi-Processing. CCIT Technical Report #488, http://www.ee.technion.ac.il/morad/publications/acc mptr.pdf, June 2004.
22
 
23
[23] TPC-H: http://www.tpc.org/tpch
 
24
[24] J. Tschanz, S. Narendra, Y. Yibin, B. Bloechel, S. Borkar, D, Vivek. Dynamic-sleep transistor and body bias for active leakage power control of microprocessors. In IEEE Journal of Solid-State Circuits, 38(11):1838-1845, November 2003.

CITED BY  20

Collaborative Colleagues:
Murali Annavaram: colleagues
Ed Grochowski: colleagues
John Shen: colleagues