|
ABSTRACT
Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and performance. Ideally, BTBs would be sufficiently large to capture the entire working set of the application and sufficiently small for fast access and practical on-chip dedicated storage. Depending on the application, these requirements are at odds. This work introduces a BTB design that accommodates large instruction footprints without dedicating expensive onchip resources. In the proposed Phantom-BTB (PBTB) design, a conventional BTB is augmented with a virtual table that collects branch target information as the application runs. The virtual table does not have fixed dedicated storage. Instead, it is transparently allocated, on demand, in the on-chip caches, at cache line granularity. The entries in the virtual table are proactively prefetched and installed in the dedicated conventional BTB, thus, increasing its perceived capacity. Experimental results with commercial workloads under full-system simulation demonstrate that PBTB improves IPC performance over a 1K-entry BTB by 6.9% on average and up to 12.7%, with a storage overhead of only 8%. Overall, the virtualized design performs within 1% of a conventional 4K-entry, single-cycle access BTB, while the dedicated storage is 3.6 times smaller.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
First the tick, now the tock: Next generation Intel microarchitecture (Nehalem). White Paper, Intel Co., 2008.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
Ioana Burcea and Andreas Moshovos. Virtualizing branch target buffers. Technical report, University of Toronto, 2008. www.eecg.toronto.edu/~ioana/papers/pbtb tech rep.pdf.
|
 |
7
|
Ioana Burcea , Stephen Somogyi , Andreas Moshovos , Babak Falsafi, Predictor virtualization, Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, March 01-05, 2008, Seattle, WA, USA
|
 |
8
|
Po-Yung Chang , Eric Hao , Yale N. Patt, Target prediction for indirect jumps, Proceedings of the 24th annual international symposium on Computer architecture, p.274-283, June 01-04, 1997, Denver, Colorado, United States
|
| |
9
|
|
| |
10
|
Philip G. Emma, Allan M. Hartstein, Brian R. Prasky, Thomas R. Puzak, Moinuddin K. A. Qureshi, and Vijayalakshmi Srinivasan. Context lookahead storage structures. IBM, U.S. Patent 7337271 B2, 2008.
|
| |
11
|
|
 |
12
|
Nikolaos Hardavellas , Stephen Somogyi , Thomas F. Wenisch , Roland E. Wunderlich , Shelley Chen , Jangwoo Kim , Babak Falsafi , James C. Hoe , Andreas G. Nowatzyk, SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture, ACM SIGMETRICS Performance Evaluation Review, v.31 n.4, p.31-34, March 2004
[doi> 10.1145/1054907.1054914]
|
| |
13
|
R. B. Hilgendorf, G. J. Heim, and W. Rosenstiel. Evaluation of branch-prediction methods on traces from commercial applications. IBM Journal of Research and Development, 43(4), 1999.
|
| |
14
|
|
 |
15
|
|
 |
16
|
Jose A. Joao , Onur Mutlu , Hyesoon Kim , Rishi Agarwal , Yale N. Patt, Improving the performance of object-oriented languages with dynamic predication of indirect jumps, Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, March 01-05, 2008, Seattle, WA, USA
|
 |
17
|
|
 |
18
|
Kimberly Keeton , David A. Patterson , Yong Qiang He , Roger C. Raphael , Walter E. Baker, Performance characterization of a Quad Pentium Pro SMP using OLTP workloads, Proceedings of the 25th annual international symposium on Computer architecture, p.15-26, June 27-July 02, 1998, Barcelona, Spain
|
| |
19
|
|
| |
20
|
Ryotaro Kobayashi, Yuji Yamada, Hideki Ando, and Toshio Shimada. A cost-effective branch target buffer with a two-level table organization. In Proceedings of the 2nd International Symposium of Low-Power and High-Speed Chips (COOL Chips II), 1999.
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
Ed H. Sussenguth. Instruction sequence control. IBM, U.S. Patent 3559183, 1971.
|
| |
29
|
|
 |
30
|
|
|