|
ABSTRACT
With semiconductor technology advancing toward deep submicron, leakage energy is of increasing concern, especially for large on-chip array structures such as caches and branch predictors. Recent work has suggested that larger, aggressive branch predictors can and should be used in order to improve microprocessor performance. A further consideration is that more aggressive branch predictors, especially multiported predictors for multiple branch prediction, may be thermal hot spots, thus further increasing leakage. Moreover, as the branch predictor holds state that is transient and predictive, elements can be discarded without adverse effect. For these reasons, it is natural to consider applying decay techniques---already shown to reduce leakage energy for caches---to branch-prediction structures.Due to the structural difference between caches and branch predictors, applying decay techniques to branch predictors is not straightforward. This paper explores the strategies for exploiting spatial and temporal locality to make decay effective for bimodal, gshare, and hybrid predictors, as well as the branch target buffer (BTB). Furthermore, the predictive behavior of branch predictors steers them towards decay based not on state-preserving, static storage cells, but rather quasi-static, dynamic storage cells. This paper will examine the results of implementing decaying branch-predictor structures with dynamic---appropriately, decaying---cells rather than the standard static SRAM cell.Overall, this paper demonstrates that decay techniques can apply to more than just caches, with the branch predictor and BTB as an example. We show decay can either be implemented at the architectural level, or with a wholesale replacement of static storage cells with quasi-static storage cells, which naturally implement decay. More importantly, decay techniques can be applied and should be applied to other such transient and/or predictive structures.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
Rajeev Balasubramonian , David Albonesi , Alper Buyuktosunoglu , Sandhya Dwarkadas, Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.245-257, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360153]
|
 |
4
|
|
 |
5
|
|
| |
6
|
Po-Ying Chang , Eric Hao , Yale N. Patt, Alternative implementations of hybrid branch predictors, Proceedings of the 28th annual international symposium on Microarchitecture, p.252-257, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
7
|
Diefendorff, K. 1999. Pentium III = Pentium II + SSE. Microprocessor Report.
|
| |
8
|
P. W. Diodato , Y.-H. Wong , C.-T Liu , K.-H. Lee , R. Dail , W. S. Lindenberger , A. C.. .. Dumbri , M. V. Depaolis , J. T. Clemens , W. W. Troutman , K. Noda , J. M. Drynan , M. Nakamae, Merged Dram-Logic In The Year 2001, Proceedings of the 1998 IEEE International Workshop on Memory Technology, Design and Testing, p.24, August 24-25, 1998
|
| |
9
|
Diodato, P. et al. 2001. Embedded DRAM: An element and circuit evaluation. In IEEE Custom Integrated Circuits Conference.
|
| |
10
|
Diodato, P. W. 2001. Personal communication.
|
 |
11
|
Krisztián Flautner , Nam Sung Kim , Steve Martin , David Blaauw , Trevor Mudge, Drowsy caches: simple techniques for reducing leakage power, Proceedings of the 29th annual international symposium on Computer architecture, p.148, May 25-29, 2002, Anchorage, Alaska
|
| |
12
|
Gwennap, L. 1996. Digital 21264 sets new standard. Microprocessor Report, 11--16.
|
| |
13
|
Hanamura, S. et al. 1987. A 256K CMOS SRAM with internal refresh. In The 1987 IEEE International Solid-State Circuits Conference.
|
| |
14
|
|
 |
15
|
Seongmoo Heo , Kenneth Barr , Mark Hampton , Krste Asanović, Dynamic fine-grain leakage reduction using leakage-biased bitlines, Proceedings of the 29th annual international symposium on Computer architecture, p.137, May 25-29, 2002, Anchorage, Alaska
|
| |
16
|
Holgate, R. W. and Ibbett, R. N. 1980. An analysis of instruction fetching strategies in pipelined computers. IEEE Transactions on Computers C-29, 4 (Apr.), 325--329.
|
| |
17
|
|
 |
18
|
Zhigang Hu , Philo Juang , Phil Diodato , Stefanos Kaxiras , Kevin Skadron , Margaret Martonosi , Douglas W. Clark, Managing leakage for transient data: decay and quasi-static 4T memory cells, Proceedings of the 2002 international symposium on Low power electronics and design, August 12-14, 2002, Monterey, California, USA
[doi> 10.1145/566408.566423]
|
| |
19
|
|
 |
20
|
|
| |
21
|
Hu, Z., Kaxiras, S., and Martonosi, M. 2003. Timekeeping techniques for predicting and optimizing memory behavior. In The 2003 IEEE International Solid-State Circuits Conference.
|
 |
22
|
|
 |
23
|
|
| |
24
|
Juang, P. et al. 2002. Implementing decay techniques using 4T quasi-static memory cells. Comput. Arch. Lett.
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
L. Li , Ismail Kadayif , Yuh-Fang Tsai , Narayanan Vijaykrishnan , Mahmut T. Kandemir , Mary Jane Irwin , Anand Sivasubramaniam, Leakage Energy Management in Cache Hierarchies, Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, p.131-140, September 22-25, 2002
|
| |
30
|
Yingmin Li , Dharmesh Parikh , Yan Zhang , Karthik Sankaranarayanan , Mircea Stan , Kevin Skadron, State-Preserving vs. Non-State-Preserving Leakage Control in Caches, Proceedings of the conference on Design, automation and test in Europe, p.10022, February 16-20, 2004
|
 |
31
|
Mikko H. Lipasti , Christopher B. Wilkerson , John Paul Shen, Value locality and load value prediction, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.138-147, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
32
|
Losq, J. J. 1982. Generalized history table for branch prediction. IBM Tech. Discl. Bull. 25, 1 (June), 99--101.
|
| |
33
|
Lyons, R. et al. 1987. CMOS static memory with a new four-transistor memory cell. In Proceedings of the 1987 Stanford Conference On Advanced Research in VLSI. 111--132.
|
| |
34
|
McFarling, S. 1993. Combining branch predictors. Tech. Note TN-36, DEC WRL.
|
| |
35
|
Noda, K. et al. 1998. A 1.9 μm2 loadless CMOS four-transistor SRAM cell in a 0.18 μm logic technology. IEDM Tech. Dig., 847--850.
|
| |
36
|
|
 |
37
|
Michael Powell , Se-Hyun Yang , Babak Falsafi , Kaushik Roy , T. N. Vijaykumar, Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories, Proceedings of the 2000 international symposium on Low power electronics and design, p.90-95, July 25-27, 2000, Rapallo, Italy
[doi> 10.1145/344166.344526]
|
| |
38
|
Roy, K. 1998. Leakage power reduction in low-voltage CMOS designs. In Proceedings of the International Conference on Electronics, Circuits, and Systems. 167--73.
|
 |
39
|
|
| |
40
|
Schuster, S., Terman, L., and Franch, R. 1987. A 4-device CMOS static RAM cell using sub-threshold conduction. In Symposium on VLSI Technology, Systems, and Applications.
|
| |
41
|
Semiconductor Industry Association. 2001. From website: The international technology roadmap for semiconductors. Available at http://public.itrs.net/Files/2001ITRS/Home.htm.
|
 |
42
|
André Seznec , Stephen Felix , Venkata Krishnan , Yiannakis Sazeides, Design tradeoffs for the Alpha EV8 conditional branch predictor, Proceedings of the 29th annual international symposium on Computer architecture, p.295, May 25-29, 2002, Anchorage, Alaska
|
| |
43
|
|
| |
44
|
Song, P. 1997. UltraSparc-3 aims at MP servers. Microprocessor Report, 29--34.
|
| |
45
|
The Standard Performance Evaluation Corporation. 2000. Available at http://www.spec.org.
|
| |
46
|
Velusamy, S. et al. 2002. Adaptive cache decay using formal feedback control. In Proceedings of the 2002 Workshop on Memory Performance Issues. In conjunction with ISCA-29).
|
| |
47
|
|
| |
48
|
|
| |
49
|
|
| |
50
|
W. Zhang , J. S. Hu , V. Degalahal , M. Kandemir , N. Vijaykrishnan , M. J. Irwin, Compiler-directed instruction cache leakage optimization, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
 |
51
|
|
| |
52
|
|
CITED BY 2
|
|
|
|
|
Jaw-Wei Chi , Chia-Lin Yang , Yi-Jung Chen , Jien-Jia Chen, Cache leakage control mechanism for hard real-time systems, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|