|
ABSTRACT
Interconnection networks have been deployed as the communication fabric in a wide range of parallel computer systems. With recent technological trends allowing growing quantities of chip resources and faster clock rates, there have been prevailing concerns of increasing power consumption being a major limiting factor in the design of parallel computer systems, from multiprocessor SoCs to multi-chip embedded systems and parallel servers. To tackle this, power-aware networks must become inherent components of single-chip and multi-chip system.On the hardware design side, while there has been some recent interconnection network power reduction research, especially targeted towards communication links, the techniques presented are ad hoc and are not tailored to the application running on the network. We show that with these ad hoc techniques, power savings and corresponding impact on network latency vary significantly from one application to the next -- in many cases network performance can suffer severely. On the software side, extensive research on compile-time optimization has produced parallelizing compilers that can efficiently map an application onto hardware for high performance. However, research into power-aware parallelizing compilers is in its infancy; none addressed communication power.In this paper, we take the first steps towards tailoring applications' communication needs at run-time for low power. We propose software techniques that extend the flow of a parallelizing compiler in order to direct run-time network power optimization. We target network links, the dominant power consumer in these systems, allowing DVS instructions extracted during static compilation to orchestrate link voltage and frequency transitions for power savings during application runtime. Concurrently, a hardware online mechanism measures network congestion levels and adapts these off-line DVS settings to optimize network performance. Our simulations show that link power consumption can be greatly reduced by up to 76.3%, with a minor increase in network latency in the range of 0.23 to 6.78% across a number of benchmark suites running on three existing parallel architectures, from very fine-grained single-chip to coarse-grained multi-chip architectures.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
InfiniBand Trade Alliance. The InfiniBand architecture. {online} http:// www.infinibandta.org.
|
| |
8
|
|
| |
9
|
|
 |
10
|
E. J. Kim , K. H. Yum , G. M. Link , N. Vijaykrishnan , M. Kandemir , M. J. Irwin , M. Yousif , C. R. Das, Energy optimization techniques in cluster interconnects, Proceedings of the 2003 international symposium on Low power electronics and design, August 25-27, 2003, Seoul, Korea
[doi> 10.1145/871506.871620]
|
| |
11
|
J. Kim and M. Horowitz. Adaptive supply serial links with sub-1V operation and per-pin clock recovery. In Proc. International Solid-State Circuits Conference (ISSCC), pp. 1403--1413, Feb. 2002.
|
 |
12
|
|
| |
13
|
|
| |
14
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
15
|
Walter Lee , Rajeev Barua , Matthew Frank , Devabhaktuni Srikrishna , Jonathan Babb , Vivek Sarkar , Saman Amarasinghe, Space-time scheduling of instruction-level parallelism on a raw machine, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.46-57, October 02-07, 1998, San Jose, California, United States
|
| |
16
|
|
| |
17
|
|
| |
18
|
V. S. Pai et. al. RSIM: An execution-driven simulator for ILP-based shared-memory multiprocessors and uniprocessors. In IEEE Technical Committee on Computer Architecture Newsletter (TCCA), 35(11), pp. 37--48, Oct. 1997.
|
| |
19
|
|
| |
20
|
PoPNet. {online} http://www.princeton.edu/edu/~lshang/popnet.html
|
 |
21
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
|
 |
22
|
H. Saputra , M. Kandemir , N. Vijaykrishnan , M. J. Irwin , J. S. Hu , C-H. Hsu , U. Kremer, Energy-conscious compilation based on voltage scaling, Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems, June 19-21, 2002, Berlin, Germany
|
| |
23
|
Semiconductor Industry Association. International Technology Roadmap for Semiconductors, 2001. {online} http://public.itrs.net/Files/2001ITRS/Home.htm
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
 |
27
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, Proceedings of the 31st annual international symposium on Computer architecture, p.2, June 19-23, 2004, München, Germany
|
| |
28
|
The Standard Performance Evaluation Corporation. {online}http://www.spec.org/
|
| |
29
|
|
| |
30
|
G. Wei et. al. A variable-frequency parallel I/O interface with adaptive power-supply regulation. Journal of Solid-State Circuits, 35(11):16001610, Nov. 2000.
|
 |
31
|
Robert P. Wilson , Robert S. French , Christopher S. Wilson , Saman P. Amarasinghe , Jennifer M. Anderson , Steve W. K. Tjiang , Shih-Wei Liao , Chau-Wen Tseng , Mary W. Hall , Monica S. Lam , John L. Hennessy, SUIF: an infrastructure for research on parallelizing and optimizing compilers, ACM SIGPLAN Notices, v.29 n.12, p.31-37, Dec. 1994
[doi> 10.1145/193209.193217]
|
 |
32
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
33
|
|
 |
34
|
|
|