ACM Home Page
Please provide us with feedback. Feedback
Implementing branch-predictor decay using quasi-static memory cells
Full text PdfPdf (1.49 MB)
Source ACM Transactions on Architecture and Code Optimization (TACO) archive
Volume 1 ,  Issue 2  (June 2004) table of contents
Pages: 180 - 219  
Year of Publication: 2004
ISSN:1544-3566
Authors
Philo Juang  Princeton University, Princeton, NJ
Kevin Skadron  University of Virginia
Margaret Martonosi  Princeton University, Princeton, NJ
Zhigang Hu  IBM T.J. Watson Research Center
Douglas W. Clark  Princeton University, Princeton, NJ
Philip W. Diodato  Agere Systems
Stefanos Kaxiras  University of Patras
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 62,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1011528.1011531
What is a DOI?

ABSTRACT

With semiconductor technology advancing toward deep submicron, leakage energy is of increasing concern, especially for large on-chip array structures such as caches and branch predictors. Recent work has suggested that larger, aggressive branch predictors can and should be used in order to improve microprocessor performance. A further consideration is that more aggressive branch predictors, especially multiported predictors for multiple branch prediction, may be thermal hot spots, thus further increasing leakage. Moreover, as the branch predictor holds state that is transient and predictive, elements can be discarded without adverse effect. For these reasons, it is natural to consider applying decay techniques---already shown to reduce leakage energy for caches---to branch-prediction structures.Due to the structural difference between caches and branch predictors, applying decay techniques to branch predictors is not straightforward. This paper explores the strategies for exploiting spatial and temporal locality to make decay effective for bimodal, gshare, and hybrid predictors, as well as the branch target buffer (BTB). Furthermore, the predictive behavior of branch predictors steers them towards decay based not on state-preserving, static storage cells, but rather quasi-static, dynamic storage cells. This paper will examine the results of implementing decaying branch-predictor structures with dynamic---appropriately, decaying---cells rather than the standard static SRAM cell.Overall, this paper demonstrates that decay techniques can apply to more than just caches, with the branch predictor and BTB as an example. We show decay can either be implemented at the architectural level, or with a wholesale replacement of static storage cells with quasi-static storage cells, which naturally implement decay. More importantly, decay techniques can be applied and should be applied to other such transient and/or predictive structures.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
 
6
 
7
Diefendorff, K. 1999. Pentium III = Pentium II + SSE. Microprocessor Report.
 
8
 
9
Diodato, P. et al. 2001. Embedded DRAM: An element and circuit evaluation. In IEEE Custom Integrated Circuits Conference.
 
10
Diodato, P. W. 2001. Personal communication.
11
 
12
Gwennap, L. 1996. Digital 21264 sets new standard. Microprocessor Report, 11--16.
 
13
Hanamura, S. et al. 1987. A 256K CMOS SRAM with internal refresh. In The 1987 IEEE International Solid-State Circuits Conference.
 
14
15
 
16
Holgate, R. W. and Ibbett, R. N. 1980. An analysis of instruction fetching strategies in pipelined computers. IEEE Transactions on Computers C-29, 4 (Apr.), 325--329.
 
17
18
 
19
20
 
21
Hu, Z., Kaxiras, S., and Martonosi, M. 2003. Timekeeping techniques for predicting and optimizing memory behavior. In The 2003 IEEE International Solid-State Circuits Conference.
22
23
 
24
Juang, P. et al. 2002. Implementing decay techniques using 4T quasi-static memory cells. Comput. Arch. Lett.
 
25
26
 
27
28
 
29
 
30
31
 
32
Losq, J. J. 1982. Generalized history table for branch prediction. IBM Tech. Discl. Bull. 25, 1 (June), 99--101.
 
33
Lyons, R. et al. 1987. CMOS static memory with a new four-transistor memory cell. In Proceedings of the 1987 Stanford Conference On Advanced Research in VLSI. 111--132.
 
34
McFarling, S. 1993. Combining branch predictors. Tech. Note TN-36, DEC WRL.
 
35
Noda, K. et al. 1998. A 1.9 μm2 loadless CMOS four-transistor SRAM cell in a 0.18 μm logic technology. IEDM Tech. Dig., 847--850.
 
36
37
 
38
Roy, K. 1998. Leakage power reduction in low-voltage CMOS designs. In Proceedings of the International Conference on Electronics, Circuits, and Systems. 167--73.
39
 
40
Schuster, S., Terman, L., and Franch, R. 1987. A 4-device CMOS static RAM cell using sub-threshold conduction. In Symposium on VLSI Technology, Systems, and Applications.
 
41
Semiconductor Industry Association. 2001. From website: The international technology roadmap for semiconductors. Available at http://public.itrs.net/Files/2001ITRS/Home.htm.
42
 
43
 
44
Song, P. 1997. UltraSparc-3 aims at MP servers. Microprocessor Report, 29--34.
 
45
The Standard Performance Evaluation Corporation. 2000. Available at http://www.spec.org.
 
46
Velusamy, S. et al. 2002. Adaptive cache decay using formal feedback control. In Proceedings of the 2002 Workshop on Memory Performance Issues. In conjunction with ISCA-29).
 
47
 
48
 
49
 
50
51
 
52


Collaborative Colleagues:
Philo Juang: colleagues
Kevin Skadron: colleagues
Margaret Martonosi: colleagues
Zhigang Hu: colleagues
Douglas W. Clark: colleagues
Philip W. Diodato: colleagues
Stefanos Kaxiras: colleagues