|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of predicting the likely out-come of branch instructions. Several researchers have proposed very effective fetch and branch prediction mechanisms including branch target buffers (BTB) that store the target addresses of taken branches. An alternative approach fetches the instruction following a branch by using an index into the cache instead of a branch target address. We call such an index a next cache line and set (NLS) predictor. A NLS predictor is a pointer into the instruction cache, indicating the target instruction of a branch.In this paper we examine the use of NLS predictors for efficient and accurate fetch and branch prediction. Previous studies associated each NLS predictor with a cache line and provided only one-bit conditional branch predictors. Our study examines the use of NLS predictors with highly accurate two-level correlated conditional branch architectures. We examine the performance of decoupling the NLS predictors from the cache line and storing them in a separate tag-less memory buffer. Our results show that the decoupled architecture performs better than associating the NLS predictors with the cache line, that the NLS architecture benefits from reduced cache miss rates, and it is particularly effective for programs containing many branches. We also provide an in-depth comparison between the NLS and BTB architectures, showing that the NLS architecture is a competitive alternative to the BTB design.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
Mike Johnson. Superscalar Microprocessor Design. Innovative Technology. Prentice-Hall. Inc., Englewood Cliffs, NJ, 1991.
|
 |
6
|
|
| |
7
|
Johnny K. F. Lee and Alan Jay Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, pages 6-22, January 1984.
|
 |
8
|
|
| |
9
|
Scott McFading. Combining branch predictors. TN 36, DEC- WRL, June 1993.
|
 |
10
|
|
| |
11
|
Johannes M. Mulder, Nhon T. Quach, and Michael J. Flynn. An area model for on-chip memories and its application. IEEE Journal of Solid-State Circuits, 26(2):98-105, February 1991.
|
 |
12
|
Shien-Tai Pan , Kimming So , Joseph T. Rahmeh, Improving the accuracy of dynamic branch prediction using branch correlation, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.76-84, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
Simon C. Steely and David J. Sager. Next line prediction apparatus for a pipelined computer system. US. Patent #5,283,873, Feb. 1994.
|
| |
19
|
Steven J. E. Wilton and Norman P. Jouppi. An enhanced access and cycle time model for on-chip caches. WRL Report 93/5, DEC Western Research Lab, 1993.
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
CITED BY 26
|
|
Byung-Kwon Chung , Jinsuo Zhang , Jih-Kwon Peir , Shih-Chang Lai , Konrad Lai, Direct load: dependence-linked dataflow resolution of load address and cache coordinate, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brad Calder , Dirk Grunwald , Joel Emer, A system level perspective on branch architecture performance, Proceedings of the 28th annual international symposium on Microarchitecture, p.199-206, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael D. Powell , Amit Agarwal , T. N. Vijaykumar , Babak Falsafi , Kaushik Roy, Reducing set-associative cache energy via way-prediction and selective direct-mapping, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
Osman S. Unsal , Raksit Ashok , Israel Koren , C. Mani Krishna , Csaba Andras Moritz, Cool-cache for hot multimedia, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lingxiang Xiang , Tianzhou Chen , Qingsong Shi , Wei Hu, Less reused filter: improving l2 cache performance via filtering less reused lines, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|