|
ABSTRACT
Dynamic voltage and frequency scaling (DVFS) is a commonly-used power-management scheme that dynamically adjusts power and performance to the time-varying needs of running programs. Unfortunately, conventional DVFS, relying on off-chip regulators, faces limitations in terms of temporal granularity and high costs when considered for future multi-core systems. To overcome these challenges, this paper presents thread motion (TM), a fine-grained power-management scheme for chip multiprocessors (CMPs). Instead of incurring the high cost of changing the voltage and frequency of different cores, TM enables rapid movement of threads to adapt the time-varying computing needs of running applications to a mixture of cores with fixed but different power/performance levels. Results show that for the same power budget, two voltage/frequency levels are sufficient to provide performance gains commensurate to idealized scenarios using per-core voltage control. Thread motion extends workload-based power management into the nanosecond realm and, for a given power budget, provides up to 20% better performance than coarse-grained DVFS.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
AMD, "AMD Turion X2 Ultra Dual-Core Processor", http://multicore.amd.com/us-en/AMD-Multi-Core.aspx
|
| |
2
|
Intel, "Intel Turbo Boost Technology", http://www.intel.com/technology/turboboost/index.htm
|
| |
3
|
Intel, "Nehalem Microarchitecture", http://www.intel.com/technology/architecture-silicon/next-gen/
|
| |
4
|
Intel, "Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture", 2008
|
| |
5
|
B. Calhoun and A. Chandrakasan, "Ultra-Dynamic Voltage Scaling (UDVS) Using Sub-Threshold Operation and Local Voltage Dithering", IEEE Journal of Solid-State Circuits, Vol 41, No 1, 2006
|
| |
6
|
|
 |
7
|
Joel Grodstein , Rachid Rayess , Tad Truex , Linda Shattuck , Sue Lowell , Dan Bailey , David Bertucci , Gabriel Bischoff , Daniel Dever , Mike Gowan , Roy Lane , Brian Lilly , Krishna Nagalla , Rahul Shah , Emily Shriver , Shi-Huang Yin , Shannon Morton, Power and CAD considerations for the 1.75mbyte, 1.2ghz L2 cache on the alpha 21364 CPU, Proceedings of the 12th ACM Great Lakes symposium on VLSI, April 18-19, 2002, New York, New York, USA
[doi> 10.1145/505306.505308]
|
 |
8
|
|
 |
9
|
|
 |
10
|
Engin Ipek , Meyrem Kirman , Nevin Kirman , Jose F. Martinez, Core fusion: accommodating software diversity in chip multiprocessors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
11
|
|
| |
12
|
|
| |
13
|
A. Jaleel and R. Cohn and C. Luk, "CMP$im: Using Pin to Characterize Memory Behavior of Emerging workloads on CMPs", Intel Design, Test and Technologies Conference (DTTC), 2006
|
 |
14
|
Philo Juang , Qiang Wu , Li-Shiuan Peh , Margaret Martonosi , Douglas W. Clark, Coordinated, distributed, formal energy management of chip multiprocessors, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
[doi> 10.1145/1077603.1077637]
|
| |
15
|
Wonyoung Kim and Meeta Gupta and Gu-Yeon Wei and David Brooks, "System level analysis of fast, per-core DVFS using on-chip switching regulators", Symposium on High-Performance Computer Architecture, 2008
|
| |
16
|
G. Konstadinidis and M. Rashid and P. Lai and Y. Otaguro and Y. Orginos and S. Parampalli and M. Steigerwald and S. Gundala and R. Paypali and L. Rarick and I. Elkin and Y. Ge and I. Parulkar, "Implementation of a Third-Generation 16-Core 32-Thread Chip-Multithreading SPARC Processor", IEEE International Solid-State Circuits Conference, 2008
|
| |
17
|
D. Krueger and E. Francom1 and J. Langsdorf, "Circuit Design for Voltage Scaling and SER Immunity on a Quad-Core Itanium Processor", IEEE International Solid-State Circuits Conference, 2008
|
| |
18
|
|
 |
19
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
 |
20
|
|
| |
21
|
R. McGowen and C. Poirier and C. Bostak and J. Ignowski and M. Millican and W. Parks and S. Naffziger, "Power and Temperature Control on a 90-nm Itanium Family Processor", IEEE Journal of Solid-State Circuits, Jan 2006
|
| |
22
|
Harish Patil , Robert Cohn , Mark Charney , Rajiv Kapoor , Andrew Sun , Anand Karunanidhi, Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.81-92, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.28]
|
| |
23
|
D. Pham and S. Asano and M. Bolliger and M. N. Day and H. P. Hofstee and C. Johns and J. Kahle and A. Kameyama and J. Keaty and Y. Masubuchi and M. Riley and D. Shippy and D. Stasiak and M. Suzuoki and M. Wang and J. Warnock and S. Weitzel and D. Wendel and T. Yamazaki and K. Yazawa, "The Design and Implementation of a First Generation CELL Processor", IEEE International Solid-State Circuits Conference, 2005
|
 |
24
|
|
 |
25
|
Larry Seiler , Doug Carmean , Eric Sprangle , Tom Forsyth , Michael Abrash , Pradeep Dubey , Stephen Junkins , Adam Lake , Jeremy Sugerman , Robert Cavin , Roger Espasa , Ed Grochowski , Toni Juan , Pat Hanrahan, Larrabee: a many-core x86 architecture for visual computing, ACM Transactions on Graphics (TOG), v.27 n.3, August 2008
|
| |
26
|
|
| |
27
|
M. Tremblay and S. Chaudhry, "A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC Processor", IEEE International Solid-State Circuits Conference, 2008
|
 |
28
|
Chris Wilkerson , Hongliang Gao , Alaa R. Alameldeen , Zeshan Chishti , Muhammad Khellah , Shih-Lien Lu, Trading off Cache Capacity for Reliability to Enable Low Voltage Operation, Proceedings of the 35th International Symposium on Computer Architecture, p.203-214, June 21-25, 2008
|
 |
29
|
Samuel Williams , Leonid Oliker , Richard Vuduc , John Shalf , Katherine Yelick , James Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
[doi> 10.1145/1362622.1362674]
|
|