|
ABSTRACT
Caches contribute to much of a microprocessor system's power and energy consumption. Numerous new cache architectures, such as phased, pseudo-set-associative, way predicting, reactive-associative, way-shutdown, way-concatenating, and highly-associative, are intended to reduce power and/or energy, but they all impose some performance overhead. We have developed a new cache architecture, called a way-halting cache, that reduces energy further than previously mentioned architectures, while imposing no performance overhead. Our way-halting cache is a four-way set-associative cache that stores the four lowest-order bits of all ways' tags into a fully associative memory, which we call the halt tag array. The lookup in the halt tag array is done in parallel with, and is no slower than, the set-index decoding. The halt tag array predetermines which tags cannot match due to their low-order 4 bits mismatching. Further accesses to ways with known mismatching tags are then halted, thus saving power. Our halt tag array has an additional feature of using static logic only, rather than dynamic logic used in highly associative caches, making our cache simpler to design with existing tools. We provide data from experiments on 29 benchmarks drawn from Powerstone, Mediabench, and Spec 2000, based on our layouts in 0.18 micron CMOS technology. On average, we obtained 55% savings of memory-access related energy over a conventional four-way set-associative cache. We show that savings are greater than previous methods, and nearly twice that of highly associative caches, while imposing no performance overhead and only 2% cache area overhead.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Advanced Micro Devices. http://www.amd.com.
|
| |
2
|
Albonesi, D. H. 2000. Selective cache ways: On-demand cache resource allocation. Journal of Instruction Level Parallelism.
|
| |
3
|
Amrutur, B. and Horowitz, M. 1998. A replica technique for word line and sense control in low-power SRAM's. IEEE Journal of Solid-State Circuits 33, 8, 1208--1218.
|
| |
4
|
|
| |
5
|
Burger, D. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. University of Wisconsin-Madison Computer Sciences Dept., Technical Report #1342.
|
| |
6
|
Cadence. http://www.cadence.com.
|
| |
7
|
|
| |
8
|
|
| |
9
|
John H. Edmondson , Paul I. Rubinfeld , Peter J. Bannon , Bradley J. Benschneider , Debra Bernstein , Ruben W. Castelino , Elizabeth M. Cooper , Daniel E. Dever , Dale R. Donchin , Timothy C. Fischer , Anil K. Jain , Shekhar Mehta , Jeanne E. Meyer , Ronald P. Preston , Vidya Rajagopalan , Chandrasekhara Somanathan , Scott A. Taylor , Gilbert M. Wolrich, Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor, Digital Technical Journal, v.7 n.1, p.119-135, Jan. 1995
|
 |
10
|
|
| |
11
|
Steve B. Furber , Aristides Efthymiou , Jim D. Garside , David W. Lloyd , Mike J. G. Lewis , Steve Temple, Power Management in the Amulet Microprocessors, IEEE Design & Test, v.18 n.2, p.42-52, March 2001
[doi> 10.1109/54.914617]
|
| |
12
|
|
| |
13
|
Atsushi Hasegawa , Ikuya Kawasaki , Kouji Yamada , Shinichi Yoshioka , Shumpei Kawasaki , Prasenjit Biswas, SH3: High Code Density, Low Power, IEEE Micro, v.15 n.6, p.11-19, December 1995
[doi> 10.1109/40.476254]
|
| |
14
|
|
 |
15
|
Michael Huang , Jose Renau , Seung-Moon Yoo , Josep Torrellas, L1 data cache decomposition for energy efficiency, Proceedings of the 2001 international symposium on Low power electronics and design, p.10-15, August 2001, Huntington Beach, California, United States
[doi> 10.1145/383082.383086]
|
| |
16
|
IBM. http://www.ibm.com.
|
| |
17
|
|
 |
18
|
Koji Inoue , Tohru Ishihara , Kazuaki Murakami, Way-predicting set-associative cache for high performance and low energy consumption, Proceedings of the 1999 international symposium on Low power electronics and design, p.273-275, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313948]
|
 |
19
|
Toni Juan , Tomás Lang , Juan J. Navarro, The difference-bit cache, Proceedings of the 23rd annual international symposium on Computer architecture, p.114-120, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
20
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
21
|
|
 |
22
|
|
| |
23
|
MIPS Technologies, Inc. http://www.mips.com.
|
| |
24
|
Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Farell, A., Hoeppner, G. W., Kruckemyer, D., Lee, T. H., Lin, P., Madden, L., Murray, D., Pearce, M., Santhanam, S., Snyder, K. J., Stephany, R., and Thierauf, S. C. 1996. A 160 MHz 32 b 0.5 W CMOS RISC microprocessor. In IEEE International Solid-State Circuits Conference.
|
| |
25
|
The Mosis Service. http://www.mosis.org.
|
 |
26
|
|
| |
27
|
Michael D. Powell , Amit Agarwal , T. N. Vijaykumar , Babak Falsafi , Kaushik Roy, Reducing set-associative cache energy via way-prediction and selective direct-mapping, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
 |
28
|
|
| |
29
|
Reinmann, G. and Jouppi, N. P. 1999. CACTI2.0: An Integrated Cache Timing and Power Model. COMPAQ Western Research Lab.
|
| |
30
|
Santhanam, S., et al. 1998. A low-cost, 300-MHz, RISC CPU with attached media processor. IEEE Journal of Solid-State Circuits 33, 11.
|
| |
31
|
Segars, S. 2000. Low power design techniques for microprocessors. In International Solid-State Circuits Conference Tutorial.
|
 |
32
|
George Taylor , Peter Davies , Michael Farmwald, The TLB slice—a low-cost high-speed address translation mechanism, Proceedings of the 17th annual international symposium on Computer Architecture, p.355-363, May 28-31, 1990, Seattle, Washington, United States
|
| |
33
|
|
| |
34
|
|
 |
35
|
Chuanjun Zhang , Frank Vahid , Jun Yang , Walid Najjar, A way-halting cache for low-energy high-performance systems, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
[doi> 10.1145/1013235.1013272]
|
 |
36
|
|
 |
37
|
|
| |
38
|
Zhang, M. and Asanovic, K. 2000. Highly-associative caches for low-power processors. In Kool Chips Workshop, in conjunction with International Symposium on Microarchitecture.
|
|