|
ABSTRACT
Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for high-end servers. While the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakage's proportion of total chip power will increase significantly.
This paper examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chip's area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of “dead time” before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce LI cache leakage energy by 4x in SPEC2000 applications without impacting performance. Because our decay-based techniques have notions of competitive on-line algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
William J. Bowhill , Shane L. Bell , Bradley J. Benschneider , Andrew J. Black , Sharon M. Britton , Ruben W. Castelino , Dale R. Donchin , John H. Edmondson , Harry R. Fair , Paul E. Gronowski , Anil K. Jain , Patricia L. Kroesen , Marc E. Lamere , Bruce J. Loughlin , Shekhar Mehata , Sribalan Santhanam , Timothy A. Shedd , Stephen C. Thierauf , Robert O. Mueller , Ronald P. Preston , Michael J. Smith, Circuit implementation of a 300-MHz 64-bit second-generation CMOS Alpha CPU, Digital Technical Journal, v.7 n.1, p.100-118, Jan. 1995
|
 |
4
|
|
| |
5
|
D. Burger, T. M. Austin, and S. Bennett. Evaluating future microprocessors: the SimpleScalar tool set. Tecfi. Report TR-1308, Univ. of Wisconsin-Madison Computer Sciences Dept., July 1996.
|
| |
6
|
D. Burger, J. Goodman, and A. Kagi. The declining effectiveness of dynamic caching for general-purpose microprocessors. Tech. Report TR- 1216, Univ. of Wisconsin-Madison Computer Sciences Dept.
|
 |
7
|
Zhanping Chen , Mark Johnson , Liqiong Wei , Kaushik Roy, Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks, Proceedings of the 1998 international symposium on Low power electronics and design, p.239-244, August 10-12, 1998, Monterey, California, United States
[doi> 10.1145/280756.280917]
|
| |
8
|
Jeffrey Dean , James E. Hicks , Carl A. Waldspurger , William E. Weihl , George Chrysos, ProfileMe: hardware support for instruction-level profiling on out-of-order processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.292-302, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
9
|
L. Gwennap. Digital 21264 sets new standard. Microprocessor Report, pages 11-16, Oct. 28, 1996.
|
 |
10
|
|
 |
11
|
|
| |
12
|
IBM Corp. Personal communication. November, 2000.
|
| |
13
|
Intel Corp. Intel architecture optimization manual.
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
Anna R. Karlin , Kai Li , Mark S. Manasse , Susan Owicki, Empirical studies of competitve spinning for a shared-memory multiprocessor, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.41-55, October 13-16, 1991, Pacific Grove, California, United States
|
| |
18
|
S. Kaxiras and C. Young. Coherence communication prediction in shared-memory multiprocessors. In Proc. HPCA-6, Jan. 2000.
|
| |
19
|
|
 |
20
|
An-Chow Lai , Babak Falsafi, Selective, accurate, and timely self-invalidation using last-touch prediction, Proceedings of the 27th annual international symposium on Computer architecture, p.139-148, June 2000, Vancouver, British Columbia, Canada
|
 |
21
|
|
| |
22
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
23
|
Jih-Kwon Peir , Yongjoon Lee , Windsor W. Hsu, Capturing dynamic memory reference behavior with adaptive cache topology, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.240-250, October 02-07, 1998, San Jose, California, United States
|
 |
24
|
Michael Powell , Se-Hyun Yang , Babak Falsafi , Kaushik Roy , T. N. Vijaykumar, Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories, Proceedings of the 2000 international symposium on Low power electronics and design, p.90-95, July 25-27, 2000, Rapallo, Italy
[doi> 10.1145/344166.344526]
|
 |
25
|
Theodore H. Romer , Wayne H. Ohlrich , Anna R. Karlin , Brian N. Bershad, Reducing TLB and memory overhead using online superpage promotion, Proceedings of the 22nd annual international symposium on Computer architecture, p.176-187, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
26
|
S. Sair and M. Charney. Memory behavior of the SPEC2000 benchmark suite. Technical report, IBM, 2000.
|
| |
27
|
Semiconductor Industry Association. The International Technology Roadmap for Semiconductors, 1999. hnp://www.semichips.org.
|
| |
28
|
|
| |
29
|
The Standard Performance Evaluation Corporation. WWW Site. http://www.spec.org, Dec. 2000.
|
| |
30
|
U.S. Environmental Protection Agency. Energy Star Program web page. http://www.epa.gov/energystar/.
|
| |
31
|
|
 |
32
|
|
 |
33
|
David A. Wood , Mark D. Hill , R. E. Kessler, A model for estimating trace-sample miss ratios, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.79-89, May 21-24, 1991, San Diego, California, United States
|
| |
34
|
|
 |
35
|
|
| |
36
|
Marco Zagha , Brond Larson , Steve Turner , Marty Itzkowitz, Performance analysis using the MIPS R10000 performance counters, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.16-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/369028.369059]
|
CITED BY 125
|
|
|
|
|
W. Zhang , J. S. Hu , V. Degalahal , M. Kandemir , N. Vijaykrishnan , M. J. Irwin, Compiler-directed instruction cache leakage optimization, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
Harry Dwyer , John Fernando, Establishing a tight bound on task interference in embedded system instruction caches, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
G. Chen , R. Shetty , M. Kandemir , N. Vijaykrishnan , M. J. Irwin , M. Wolczko, Tuning garbage collection for reducing memory system energy in an embedded java environment, ACM Transactions on Embedded Computing Systems (TECS), v.1 n.1, p.27-55, November 2002
|
|
|
|
|
|
W. Zhang , M. Karakoy , M. Kandemir , G. Chen, A compiler approach for reducing data cache energy, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
Zhigang Hu , Philo Juang , Phil Diodato , Stefanos Kaxiras , Kevin Skadron , Margaret Martonosi , Douglas W. Clark, Managing leakage for transient data: decay and quasi-static 4T memory cells, Proceedings of the 2002 international symposium on Low power electronics and design, August 12-14, 2002, Monterey, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. Chen , N. Vijaykrishnan , M. Kandemir , M. J. Irwin , M. Wolczko, Tracking object life cycle for leakage energy optimization, Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, October 01-03, 2003, Newport Beach, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
I. Kadayif , M. Kandemir , G. Chen , N. Vijaykrishnan , M. J. Irwin , A. Sivasubramaniam, Compiler-directed high-level energy estimation and optimization, ACM Transactions on Embedded Computing Systems (TECS), v.4 n.4, p.819-850, November 2005
|
|
|
W. Zhang , J. S. Hu , V. Degalahal , M. Kandemir , N. Vijaykrishnan , M. J. Irwin, Reducing instruction cache energy consumption using a compiler-based strategy, ACM Transactions on Architecture and Code Optimization (TACO), v.1 n.1, p.3-33, March 2004
|
|
|
Philo Juang , Kevin Skadron , Margaret Martonosi , Zhigang Hu , Douglas W. Clark , Philip W. Diodato , Stefanos Kaxiras, Implementing branch-predictor decay using quasi-static memory cells, ACM Transactions on Architecture and Code Optimization (TACO), v.1 n.2, p.180-219, June 2004
|
|
|
J. S. Hu , A. Nadgir , N. Vijaykrishnan , M. J. Irwin , M. Kandemir, Exploiting program hotspots and code sequentiality for instruction cache leakage management, Proceedings of the 2003 international symposium on Low power electronics and design, August 25-27, 2003, Seoul, Korea
|
|
|
|
|
|
|
|
|
Lin Li , Vijay Degalahal , N. Vijaykrishnan , Mahmut Kandemir , Mary Jane Irwin, Soft error and energy consumption interactions: a data cache perspective, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
|
|
|
W. Zhang , M. Kandemir , A. Sivasubramaniam , M. J. Irwin, Performance, energy, and reliability tradeoffs in replicating hot cache lines, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
|
|
|
Nam Sung Kim , Todd Austin , David Blaauw , Trevor Mudge , Krisztián Flautner , Jie S. Hu , Mary Jane Irwin , Mahmut Kandemir , Vijaykrishnan Narayanan, Leakage Current: Moore's Law Meets Static Power, Computer, v.36 n.12, p.68-75, December 2003
|
|
|
Nam Sung Kim , Krisztián Flautner , David Blaauw , Trevor Mudge, Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction caches, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ismail Kadayif , Mahmut Kandemir , Guilin Chen , Ozcan Ozturk , Mustafa Karakoy , Ugur Sezer, Optimizing Array-Intensive Applications for On-Chip Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, v.16 n.5, p.396-411, May 2005
|
|
|
|
|
|
|
|
|
Zhigang Hu , Alper Buyuktosunoglu , Viji Srinivasan , Victor Zyuban , Hans Jacobson , Pradip Bose, Microarchitectural techniques for power gating of execution units, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
|
|
|
Stefanos Kaxiras , Polychronis Xekalakis, 4T-decay sensors: a new class of small, fast, robust, and low-power, temperature/leakage sensors, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shengqi Yang , Wayne Wolf , Wenping Wang , N. Vijaykrishnan , Yuan Xie, Low-leakage robust SRAM cell design for sub-100nm technologies, Proceedings of the 2005 conference on Asia South Pacific design automation, January 18-21, 2005, Shanghai, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Steven Dropsho , Volkan Kursun , David H. Albonesi , Sandhya Dwarkadas , Eby G. Friedman, Managing static leakage energy in microprocessor functional units, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael D. Powell , Amit Agarwal , T. N. Vijaykumar , Babak Falsafi , Kaushik Roy, Reducing set-associative cache energy via way-prediction and selective direct-mapping, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M. Kandemir , G. Chen , F. Li , M. J. Irwin , I. Kolcu, Activity clustering for leakage management in SPMs, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ja Chun Ku , Serkan Ozdemir , Gokhan Memik , Yehea Ismail, Power density minimization for highly-associative caches in embedded processors, Proceedings of the 16th ACM Great Lakes symposium on VLSI, April 30-May 01, 2006, Philadelphia, PA, USA
|
|
|
Kimish Patel , Luca Benini , Enrico Macii , Massimo Poncino, STV-Cache: a leakage energy-efficient architecture for data caches, Proceedings of the 16th ACM Great Lakes symposium on VLSI, April 30-May 01, 2006, Philadelphia, PA, USA
|
|
|
W. Zhang , Y.-F. Tsai , D. Duarte , N. Vijaykrishnan , M. Kandemir , M. J. Irwin, Reducing dynamic and leakage energy in VLIW architectures, ACM Transactions on Embedded Computing Systems (TECS), v.5 n.1, p.1-28, February 2006
|
|
|
|
|
|
|
|
|
Michael Healy , Mario Vittes , Mongkol Ekpanyapong , Chinnakrishnan Ballapuram , Sung Kyu Lim , Hsien-Hsin S. Lee , Gabriel H. Loh, Microarchitectural floorplanning under performance and thermal tradeoff, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
Olga Golubeva , Mirko Loghi , Massimo Poncino , Enrico Macii, Architectural leakage-aware management of partitioned scratchpad memories, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yingmin Li , Dharmesh Parikh , Yan Zhang , Karthik Sankaranarayanan , Mircea Stan , Kevin Skadron, State-Preserving vs. Non-State-Preserving Leakage Control in Caches, Proceedings of the conference on Design, automation and test in Europe, p.10022, February 16-20, 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David Atienza , Praveen Raghavan , José L. Ayala , Giovanni De Micheli , Francky Catthoor , Diederik Verkest , Marisa López-Vallejo, Joint hardware-software leakage minimization approach for the register file of VLIW embedded architectures, Integration, the VLSI Journal, v.41 n.1, p.38-48, January, 2008
|
|
|
Jaw-Wei Chi , Chia-Lin Yang , Yi-Jung Chen , Jien-Jia Chen, Cache leakage control mechanism for hard real-time systems, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|
|
Olga Golubeva , Mirko Loghi , Enrico Macii , Massimo Poncino, Locality-driven architectural cache sub-banking for leakage energy reduction, Proceedings of the 2007 international symposium on Low power electronics and design, August 27-29, 2007, Portland, OR, USA
|
|
|
Isao Kotera , Ryusuke Egawa , Hiroyuki Takizawa , Hiroaki Kobayashi, A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.113-120, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Houman Homayoun , Mohammad Makhzan , Alex Veidenbaum, Multiple sleep mode leakage control for cache peripheral circuits in embedded processors, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
Timothy M. Jones , Sandro Bartolini , Bruno De Bus , John Cavazos , Michael F. P. O'Boyle, Instruction cache energy saving through compiler way-placement, Proceedings of the conference on Design, automation and test in Europe, March 10-14, 2008, Munich, Germany
|
|
|
Shengqi Yang , Wenping Wang , Tiehan Lu , Wayne Wolf , N. Vijaykrishnan , Yuan Xie, Case study of reliability-aware and low-power design, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, v.16 n.7, p.861-873, July 2008
|
|
|
|
|
|
|
|
|
Ismail Kadayif , Ayhan Zorlubas , Selcuk Koyuncu , Olcay Kabal , Davut Akcicek , Yucel Sahin , Mahmut Kandemir, Capturing and optimizing the interactions between prefetching and cache line turnoff, Microprocessors & Microsystems, v.32 n.7, p.394-404, October, 2008
|
|
|
|
|
|
|
|
|
Abhishek Das , Berkin Ozisikyilmaz , Serkan Ozdemir , Gokhan Memik , Joseph Zambreno , Alok Choudhary, Evaluating the effects of cache redundancy on profit, Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, p.388-398, November 08-12, 2008
|
|
|
Hyunhee Kim , Sungjun Youn , Jihong Kim, A leakage-aware cache sharing technique for low-power chip multi-processors (CMPs) with private L2 caches, Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture, p.30-37, October 26-26, 2008, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|