|
ABSTRACT
Set-associative caches achieve low miss rates for typical applications but result in significant energy dissipation. Set-associative caches minimize access time by probing all the data ways in parallel with the tag lookup, although the output of only the matching way is used. The energy spent accessing the other ways is wasted. Eliminating the wasted energy by performing the data lookup sequentially following the tag lookup substantially increases cache access time, and is unacceptable for high-performance L1 caches. In this paper, we apply two previously-proposed techniques, way-prediction and selective direct-mapping, to reducing L1 cache dynamic energy while maintaining high performance. The techniques predict the matching way and probe only the predicted way and not all the ways, achieving energy savings. While these techniques were originally proposed to improve set-associative cache access times, this is the first paper to apply them to reducing cache energy.We evaluate the effectiveness of these techniques in reducing L1 d-cache, L1 i-cache, and overall processor energy. Using these techniques, our caches achieve the energy-delay of sequential access while maintaining the performance of parallel access. Relative to parallel access L1 i- and d-caches, the techniques achieve overall processor energy-delay reduction of 8%, while perfect way-prediction with no performance degradation achieves 10% reduction. The performance degradation of the techniques is less than 3%, compared to an aggressive, 1-cycle, 4-way, parallel access cache.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Santosh G. Abraham , Rabin A. Sugumar , Daniel Windheiser , B. R. Rau , Rajiv Gupta, Predictability of load/store instruction latencies, Proceedings of the 26th annual international symposium on Microarchitecture, p.139-152, December 01-03, 1993, Austin, Texas, United States
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
Nikolaos Bellas , Ibrahim Hajj , Constantine Polychronopoulos, Using dynamic cache management techniques to reduce energy in a high-performance processor, Proceedings of the 1999 international symposium on Low power electronics and design, p.64-69, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313856]
|
 |
6
|
|
| |
7
|
J. Bunda, W. Athas, and D. Fussell. Evaluating power implications of CMOS microprocessor design decisions. In Proceedings of the 1994 International Symposium on Low Power Electronics and Design (ISLPED), pages 147-152, Apr. 1994.
|
| |
8
|
D. Burger and T. M. Austin. The SimpleScalar tool set, version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin-Madison, June 1997.
|
 |
9
|
|
| |
10
|
|
| |
11
|
John H. Edmondson , Paul I. Rubinfeld , Peter J. Bannon , Bradley J. Benschneider , Debra Bernstein , Ruben W. Castelino , Elizabeth M. Cooper , Daniel E. Dever , Dale R. Donchin , Timothy C. Fischer , Anil K. Jain , Shekhar Mehta , Jeanne E. Meyer , Ronald P. Preston , Vidya Rajagopalan , Chandrasekhara Somanathan , Scott A. Taylor , Gilbert M. Wolrich, Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor, Digital Technical Journal, v.7 n.1, p.119-135, Jan. 1995
|
 |
12
|
Michael K. Gowan , Larry L. Biro , Daniel B. Jackson, Power considerations in the design of the Alpha 21264 microprocessor, Proceedings of the 35th annual conference on Design automation, p.726-731, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277226]
|
 |
13
|
Koji Inoue , Tohru Ishihara , Kazuaki Murakami, Way-predicting set-associative cache for high performance and low energy consumption, Proceedings of the 1999 international symposium on Low power electronics and design, p.273-275, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313948]
|
 |
14
|
|
| |
15
|
Johnson Kin , Munish Gupta , William H. Mangione-Smith, The filter cache: an energy efficient memory structure, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.184-193, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
16
|
|
 |
17
|
|
| |
18
|
S. J. E. Wilson and N. P. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 93/5, Digital Equipment Corporation, Western Research Laboratory, July 1994.
|
| |
19
|
|
CITED BY 46
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chuanjun Zhang , Frank Vahid , Jun Yang , Walid Najjar, A way-halting cache for low-energy high-performance systems, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Murali Jayapala , Francisco Barat , Tom Vander Aa , Francky Catthoor , Henk Corporaal , Geert Deconinck, Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors, IEEE Transactions on Computers, v.54 n.6, p.672-683, June 2005
|
|
|
|
|
|
|
|
|
|
|
|
Yuan Cai , Marcus T. Schmitz , Alireza Ejlali , Bashir M. Al-Hashimi , Sudhakar M. Reddy, Cache size selection for performance, energy and reliability of time-constrained systems, Proceedings of the 2006 conference on Asia South Pacific design automation, January 24-27, 2006, Yokohama, Japan
|
|
|
|
|
|
|
|
|
Steven Dropsho , Greg Semeraro , David H. Albonesi , Grigorios Magklis , Michael L. Scott, Dynamically Trading Frequency for Complexity in a GALS Microprocessor, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.157-168, December 04-08, 2004, Portland, Oregon
|
|
|
Juan C. Moure , Domingo Benítez , Dolores I. Rexachs , Emilio Luque, Wide and efficient trace prediction using the local trace predictor, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ismail Kadayif , Ayhan Zorlubas , Selcuk Koyuncu , Olcay Kabal , Davut Akcicek , Yucel Sahin , Mahmut Kandemir, Capturing and optimizing the interactions between prefetching and cache line turnoff, Microprocessors & Microsystems, v.32 n.7, p.394-404, October, 2008
|
|
|
|
|
|
|
|
|
Ronald G. Dreslinski , Gregory K. Chen , Trevor Mudge , David Blaauw , Dennis Sylvester , Krisztian Flautner, Reconfigurable energy efficient near threshold cache architectures, Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, p.459-470, November 08-12, 2008
|
|
|
|
|