ACM Home Page
Please provide us with feedback. Feedback
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches
Full text PdfPdf (207 KB)
Source International Symposium on Microarchitecture archive
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture table of contents
Page: 67  
Year of Publication: 2003
ISBN:0-7695-2043-X
Authors
Se-Hyun Yang  Computer Architecture Laboratory (CALCM), Carnegie Mellon University
Babak Falsafi  Computer Architecture Laboratory (CALCM), Carnegie Mellon University
Sponsor
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 14,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

High-performance caches statically pull up the bit-linesin all cache subarrays to optimize cache accesslatency. Unfortunately, such an architecture results in asignificant waste of energy in nanoscale CMOS implementationsdue to high leakage and bitline discharge inthe unaccessed subarrays. Recent research advocatesbitline isolation to control precharging of individualsubarrays using bitline precharge devices. In this paper,we carefully evaluate the energy and performancetrade-offs of bitline isolation, and propose a techniqueto exploit nearly its full potential to eliminate dischargeand reduce overall energy in level-one caches.Cycle-accurate and circuit simulation results of awide-issue superscalar processor indicate that: (1) infuture CMOS technologies (e.g., 70nm and beyond),cache architectures that exploit bitline isolation caneliminate up to 90% of the bitline discharge, (2) on-demandprecharging (i.e., decoding the address andsubsequently precharging the accessed subarrays) is notviable in level-one caches because prechargingincreases the cache access latency, and (3) our proposalfor gated precharging to exploit subarray referencelocality and precharging only the recently accessed sub-arrayseliminates nearly all of bitline discharge innanoscale CMOS caches with only a 1% of performancedegradation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
[2] B. J. Benschneider, A. J. Black, and et. al. A 300-MHz 64-b quad-issue CMOS RISC microprocessor. In IEEE Journal of Solid-State Circuits, pages 1203-1214, Nov. 1995.
 
3
4
 
5
[5] A. Chandrakasan, W. J. Bowhill, and F. Fox. Design of High-Performance Microprocessor Circuits. IEEE Press, 2001.
 
6
 
7
[7] B. Gieseke, et. al. A 600-mhz superscalar risc microprocessor with out-of-order execution. In ISSCC Digest of Technical Papers, pages 176-177, Feb. 1997.
8
 
9
[9] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the pentium 4 processor. In Intel Technical Journal, 2001.
 
10
[10] R. Ho, K. W. Mai, and M. A. Horowitz. The future of wires. Proceedings of the IEEE, 39(4):490-504, Apr. 2001.
11
12
 
13
 
14
[14] J. Montanaro, R. T. Witek, K. Anne, A. J. Black, E. M. Cooper, D. W. Dobberpuhl, P. M. Donahue, J. Eno, G. W. Hoeppner, D. Kruckemyer, T. H. Lee, P. C. M. Lin, L. Madden, D. Murray, M. H. Pearce, S. Santhanam, K. J. Snyder, R. Stephany, and S. C. Thierauf. A 160- MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 31(11):1703-1714, 1996.
 
15
16
17
 
18
[18] P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power, and area model. Technical Report 2001.2, Compaq Corporation, Western Research Laboratory, Aug. 2001.
 
19
[19] S. J. E. Wilton and N. P. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 93/5, Digital Equipment Corporation, Western Research Laboratory, July 1994.
 
20
[20] S.-H. Yang and B. Falsafi. Gated precharging: Using temporal locality of subarrays to save deep-submicron cache energy. In Proceedings of Workshop on Complexity-Effective Design held in conjunction with the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.
 
21
 
22
 
23


Collaborative Colleagues:
Se-Hyun Yang: colleagues
Babak Falsafi: colleagues