|
ABSTRACT
The mobile computing device market has been growing rapidly. This brings the technologies that optimize system energy to the forefront. As circuits continue to scale in the future, it would be important to optimize both leakage and dynamic energy. Effective optimization of leakage and dynamic energy consumption requires a vertical integration of techniques spanning from circuit to software levels. Schedule slacks in codes executing in VLIW architectures present an opportunity for such an integration. In this paper, we present three compiler-directed techniques that take advantage of schedule slacks to optimize leakage and dynamic energy consumption. Integer ALU (IALU) components operating with multiple supply voltages are designed to provide different low-energy versions that possess different operational latencies. The goal of the first technique explored is to maximize the number of operations mapped to IALU components with the lowest energy consumption without extending the schedule length. We also consider a variant of this technique that saves more energy at the cost of some performance loss. The second technique uses two leakage-control mechanisms to reduce leakage energy consumption when no operations are scheduled in the component. Our evaluation of these two approaches, using fifteen benchmarks, shows that based on the number and duration of slacks, the availability of low-energy functional units and the relative magnitude of leakage and dynamic energy, either leakage or dynamic energy consumption, will provide more energy gains. Finally, we provide a unified energy-optimization strategy that integrates both dynamic and leakage energy-reduction schemes. The proposed techniques have been incorporated into a cycle accurate simulator using parameters extracted from circuit-level simulation. Our results show that the unified scheme generates better results than using either of dynamic and leakage energy-reduction techniques independently.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
Casmira, J. and Grunwald, D. 2000. Dynamic instruction scheduling slack. In 2000 Kool Chips Workshop.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
G. Chen , R. Shetty , M. Kandemir , N. Vijaykrishnan , M. J. Irwin , M. Wolczko, Tuning Garbage Collection in an Embedded Java Environment, Proceedings of the 8th International Symposium on High-Performance Computer Architecture, p.92, February 02-06, 2002
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Halter, J. P. and Najm, F. 1997. A gate-level leakage power reduction method for ultra-low-power cmos circuits. In IEEE Custom Integrated Circuits Conference. 475--478.
|
| |
15
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
16
|
Johnson, M., Somasekhar, D., and Roy, K. 1999. Models and algorithms for bounds in cmos circuits. IEEE Transactions on CAD of Integrated Circuits and Systems 18, 6 (June). 714--725.
|
 |
17
|
|
| |
18
|
Kim, H. Y., Vijaykrishnan, N., Kandemir, M., and Irwin, M. J. 2001. A framework for exploring energy-efficient vliw architectures. In International Conference on Computer Design.
|
| |
19
|
Klaiber, A. 2000. The technology behind crusoe processors. Whitepaper, Transmeta Corporation.
|
| |
20
|
|
 |
21
|
Alvin R. Lebeck , Xiaobo Fan , Heng Zeng , Carla Ellis, Power aware page allocation, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.105-116, November 2000, Cambridge, Massachusetts, United States
|
| |
22
|
|
 |
23
|
|
 |
24
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
| |
25
|
|
| |
26
|
|
| |
27
|
Nagendra, C., Irwin, M. J., and Owens, R. M. 1996. Area-time-power tradeoffs in parallel adders. IEEE Transactions on Circuits and Systems II 43, 10 (Oct.), 689--702.
|
| |
28
|
|
 |
29
|
|
| |
30
|
Roy, K. and Prasad, S. C. 2000. Low-Power CMOS VLSI Circuit Design. Wiley Interscience, NY.
|
| |
31
|
|
| |
32
|
M. Sami , D. Sciuto , C. Silvano , V. Zaccaria , R. Zafalon, Exploiting data forwarding to reduce the power budget of VLIW embedded processors, Proceedings of the conference on Design, automation and test in Europe, p.252-257, March 2001, Munich, Germany
|
| |
33
|
|
 |
34
|
Supamas Sirichotiyakul , Tim Edwards , Chanhee Oh , Jingyan Zuo , Abhijit Dharchoudhury , Rajendran Panda , David Blaauw, Stand-by power minimization through simultaneous threshold voltage selection and circuit sizing, Proceedings of the 36th ACM/IEEE conference on Design automation, p.436-441, June 21-25, 1999, New Orleans, Louisiana, United States
[doi> 10.1145/309847.309975]
|
| |
35
|
Toburen, M. C., Conte, T. M., and Reilly, M. 1998. Instruction scheduling for low power dissipation in high performance processors. In Power Driven Microarchitecture Workshop.
|
| |
36
|
Trimaran home page. http://www.trimaran.org.
|
| |
37
|
|
| |
38
|
Ye, Y., Borkar, S., and De, V. 1998. A new technique for standby leakage reduction in high-performance circuits. In Symposium on VLSI Circuits,. 40--41.
|
 |
39
|
|
|