|
ABSTRACT
High-performance caches statically pull up the bit-linesin all cache subarrays to optimize cache accesslatency. Unfortunately, such an architecture results in asignificant waste of energy in nanoscale CMOS implementationsdue to high leakage and bitline discharge inthe unaccessed subarrays. Recent research advocatesbitline isolation to control precharging of individualsubarrays using bitline precharge devices. In this paper,we carefully evaluate the energy and performancetrade-offs of bitline isolation, and propose a techniqueto exploit nearly its full potential to eliminate dischargeand reduce overall energy in level-one caches.Cycle-accurate and circuit simulation results of awide-issue superscalar processor indicate that: (1) infuture CMOS technologies (e.g., 70nm and beyond),cache architectures that exploit bitline isolation caneliminate up to 90% of the bitline discharge, (2) on-demandprecharging (i.e., decoding the address andsubsequently precharging the accessed subarrays) is notviable in level-one caches because prechargingincreases the cache access latency, and (3) our proposalfor gated precharging to exploit subarray referencelocality and precharging only the recently accessed sub-arrayseliminates nearly all of bitline discharge innanoscale CMOS caches with only a 1% of performancedegradation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
[2] B. J. Benschneider, A. J. Black, and et. al. A 300-MHz 64-b quad-issue CMOS RISC microprocessor. In IEEE Journal of Solid-State Circuits, pages 1203-1214, Nov. 1995.
|
| |
3
|
|
 |
4
|
|
| |
5
|
[5] A. Chandrakasan, W. J. Bowhill, and F. Fox. Design of High-Performance Microprocessor Circuits. IEEE Press, 2001.
|
| |
6
|
John H. Edmondson , Paul I. Rubinfeld , Peter J. Bannon , Bradley J. Benschneider , Debra Bernstein , Ruben W. Castelino , Elizabeth M. Cooper , Daniel E. Dever , Dale R. Donchin , Timothy C. Fischer , Anil K. Jain , Shekhar Mehta , Jeanne E. Meyer , Ronald P. Preston , Vidya Rajagopalan , Chandrasekhara Somanathan , Scott A. Taylor , Gilbert M. Wolrich, Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor, Digital Technical Journal, v.7 n.1, p.119-135, Jan. 1995
|
| |
7
|
[7] B. Gieseke, et. al. A 600-mhz superscalar risc microprocessor with out-of-order execution. In ISSCC Digest of Technical Papers, pages 176-177, Feb. 1997.
|
 |
8
|
|
| |
9
|
[9] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the pentium 4 processor. In Intel Technical Journal, 2001.
|
| |
10
|
[10] R. Ho, K. W. Mai, and M. A. Horowitz. The future of wires. Proceedings of the IEEE, 39(4):490-504, Apr. 2001.
|
 |
11
|
M. S. Hrishikesh , Doug Burger , Norman P. Jouppi , Stephen W. Keckler , Keith I. Farkas , Premkishore Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
 |
12
|
Koji Inoue , Tohru Ishihara , Kazuaki Murakami, Way-predicting set-associative cache for high performance and low energy consumption, Proceedings of the 1999 international symposium on Low power electronics and design, p.273-275, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313948]
|
| |
13
|
|
| |
14
|
[14] J. Montanaro, R. T. Witek, K. Anne, A. J. Black, E. M. Cooper, D. W. Dobberpuhl, P. M. Donahue, J. Eno, G. W. Hoeppner, D. Kruckemyer, T. H. Lee, P. C. M. Lin, L. Madden, D. Murray, M. H. Pearce, S. Santhanam, K. J. Snyder, R. Stephany, and S. C. Thierauf. A 160- MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 31(11):1703-1714, 1996.
|
| |
15
|
Michael D. Powell , Amit Agarwal , T. N. Vijaykumar , Babak Falsafi , Kaushik Roy, Reducing set-associative cache energy via way-prediction and selective direct-mapping, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
 |
16
|
|
 |
17
|
|
| |
18
|
[18] P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power, and area model. Technical Report 2001.2, Compaq Corporation, Western Research Laboratory, Aug. 2001.
|
| |
19
|
[19] S. J. E. Wilton and N. P. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 93/5, Digital Equipment Corporation, Western Research Laboratory, July 1994.
|
| |
20
|
[20] S.-H. Yang and B. Falsafi. Gated precharging: Using temporal locality of subarrays to save deep-submicron cache energy. In Proceedings of Workshop on Complexity-Effective Design held in conjunction with the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
CITED BY 2
|
|
Nam Sung Kim , Krisztián Flautner , David Blaauw , Trevor Mudge, Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction caches, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
|
|
|
|
|