|
ABSTRACT
Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for highend servers. Although the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakage's proportion of total chip power will increase significantly. This article examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chip's area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and "turning off" cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of "dead time" before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce L1 cache leakage energy by 4x in SPEC2000 applications without having an impact on performance. Because our decay-based techniques have notions of competitive online algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce L1 cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
William J. Bowhill , Shane L. Bell , Bradley J. Benschneider , Andrew J. Black , Sharon M. Britton , Ruben W. Castelino , Dale R. Donchin , John H. Edmondson , Harry R. Fair , Paul E. Gronowski , Anil K. Jain , Patricia L. Kroesen , Marc E. Lamere , Bruce J. Loughlin , Shekhar Mehata , Sribalan Santhanam , Timothy A. Shedd , Stephen C. Thierauf , Robert O. Mueller , Ronald P. Preston , Michael J. Smith, Circuit implementation of a 300-MHz 64-bit second-generation CMOS Alpha CPU, Digital Technical Journal, v.7 n.1, p.100-118, Jan. 1995
|
 |
5
|
|
| |
6
|
Burger, D., Austin, T. M., and Bennett, S. 1996. Evaluating future microprocessors: The SimpleScalar tool set. Tech. Rep. TR-1308 (July), Univ. of Wisconsin---Madison Computer Sciences Dept.
|
| |
7
|
Burger, D., Goodman, J., and Kagi, A. 1995. The declining effectiveness of dynamic caching for general-purpose microprocessors. Tech. Rep. TR-1216, Univ. of Wisconsin---Madison Computer Sciences Dept.
|
 |
8
|
|
 |
9
|
Zhanping Chen , Mark Johnson , Liqiong Wei , Kaushik Roy, Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks, Proceedings of the 1998 international symposium on Low power electronics and design, p.239-244, August 10-12, 1998, Monterey, California, United States
[doi> 10.1145/280756.280917]
|
| |
10
|
Jeffrey Dean , James E. Hicks , Carl A. Waldspurger , William E. Weihl , George Chrysos, ProfileMe: hardware support for instruction-level profiling on out-of-order processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.292-302, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
11
|
|
| |
12
|
Gwennap, L. 1996. Digital 21264 sets new standard. Microproc. Rep. 11--16.
|
 |
13
|
|
| |
14
|
IBM Corp. 2000. Personal communication. November.
|
| |
15
|
Intel Corp. 1997. Intel architecture optimization manual.
|
 |
16
|
|
 |
17
|
Anna R. Karlin , Kai Li , Mark S. Manasse , Susan Owicki, Empirical studies of competitve spinning for a shared-memory multiprocessor, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.41-55, October 13-16, 1991, Pacific Grove, California, United States
|
| |
18
|
|
 |
19
|
An-Chow Lai , Babak Falsafi, Selective, accurate, and timely self-invalidation using last-touch prediction, Proceedings of the 27th annual international symposium on Computer architecture, p.139-148, June 2000, Vancouver, British Columbia, Canada
|
 |
20
|
Alvin R. Lebeck , Xiaobo Fan , Heng Zeng , Carla Ellis, Power aware page allocation, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.105-116, November 2000, Cambridge, Massachusetts, United States
|
 |
21
|
|
| |
22
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
23
|
|
 |
24
|
Jih-Kwon Peir , Yongjoon Lee , Windsor W. Hsu, Capturing dynamic memory reference behavior with adaptive cache topology, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.240-250, October 02-07, 1998, San Jose, California, United States
|
 |
25
|
Michael Powell , Se-Hyun Yang , Babak Falsafi , Kaushik Roy , T. N. Vijaykumar, Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories, Proceedings of the 2000 international symposium on Low power electronics and design, p.90-95, July 25-27, 2000, Rapallo, Italy
[doi> 10.1145/344166.344526]
|
 |
26
|
Theodore H. Romer , Wayne H. Ohlrich , Anna R. Karlin , Brian N. Bershad, Reducing TLB and memory overhead using online superpage promotion, Proceedings of the 22nd annual international symposium on Computer architecture, p.176-187, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
27
|
Sair, S. and Charney, M. 2000. Memory behavior of the SPEC2000 benchmark suite. Tech. Rep., IBM.
|
| |
28
|
Semiconductor Industry Association. 1999. The International Technology Roadmap for Semiconductors. Available at http://www.semichips.org.
|
| |
29
|
Stallings, W. 2001. Operating Systems. Prentice-Hall, Englewood Cliffs, N.J.
|
| |
30
|
The Standard Performance Evaluation Corporation. 2000. WWW Site. http://www.spec.org.
|
| |
31
|
U.S. Environmental Protection Agency. 2001. Energy Star Program Web page. http://www. epa.gov/energystar/.
|
 |
32
|
|
 |
33
|
David A. Wood , Mark D. Hill , R. E. Kessler, A model for estimating trace-sample miss ratios, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.79-89, May 21-24, 1991, San Diego, California, United States
|
| |
34
|
|
 |
35
|
|
| |
36
|
Marco Zagha , Brond Larson , Steve Turner , Marty Itzkowitz, Performance analysis using the MIPS R10000 performance counters, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.16-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/369028.369059]
|
CITED BY 11
|
|
Philo Juang , Kevin Skadron , Margaret Martonosi , Zhigang Hu , Douglas W. Clark , Philip W. Diodato , Stefanos Kaxiras, Implementing branch-predictor decay using quasi-static memory cells, ACM Transactions on Architecture and Code Optimization (TACO), v.1 n.2, p.180-219, June 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|