|
ABSTRACT
An increasing cache latency in future processors incurs profound performance impacts in spite of advanced out-of-order execution techniques. In this paper, we describe an early address resolution mechanism that accurately resolves both regular and irregular load addresses. The basic idea is to build dynamic dependence links from the instruction that updates the base register to the consumer load instructions. Once a new base address is available, it triggers calculations of the new load addresses for dependent loads. Furthermore, the exact cache location of the requested data is predicted based on the newly resolved load address. As a result, this direct load can access the data cache directly to achieve a zero-cycle load latency. Performance evaluation using SPEC integer programs shows that the dynamic dependence links can be established accurately. Combined with stride-based predictor, the proposed early address resolution achieves about 97% average accuracy with less than 1% misprediction. Based on a modified SimpleScalar model, the proposed method can potentially improve IPC by about 18%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. Ahuja, J. Emer, A. Klauser, and S. Mukherjee, "Performance Potential of Effective Address Prediction of Load Instructions," Prod. of 2001 Workshop on Memory Performance Issues, July 2001, (12 pages).
|
| |
2
|
|
 |
3
|
Michael Bekerman , Stephan Jourdan , Ronny Ronen , Gilad Kirshenboim , Lihu Rappoport , Adi Yoaz , Uri Weiser, Correlated load-address predictors, Proceedings of the 26th annual international symposium on Computer architecture, p.54-63, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
4
|
Michael Bekerman , Adi Yoaz , Freddy Gabbay , Stephan Jourdan , Maxim Kalaev , Ronny Ronen, Early load address resolution via register tracking, Proceedings of the 27th annual international symposium on Computer architecture, p.306-315, June 2000, Vancouver, British Columbia, Canada
|
| |
5
|
D. Burger and T. Austin, "The SimpleScalar Tool Set, Version 2.0", Technical Report #1342, CS Department, Univ. of Wisconsin-Madison, June 1997.
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
R. Eickemeyer and S. Vassiliadis, "A Load-Instruction Unit For Pipelined Processors," IBM Journal of Research and Development, Vol. 37(4), pp. 547-564, July 1993.
|
| |
11
|
Alexandre Farcy , Olivier Temam , Roger Espasa , Toni Juan, Dataflow analysis of branch mispredictions and its application to early resolution of branch outcomes, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.59-68, November 1998, Dallas, Texas, United States
|
 |
12
|
|
| |
13
|
|
| |
14
|
K. Hua, A, Hunt, L. Liu, J-K. Peir, D. Pruett, and J. Temple, "Early Resolution of Address Translation in Cache Design," Proc. of 1990 Int'l Conf. on Computer Design, Boston, MA, Sep. 1990, pp. 408-412.
|
| |
15
|
|
 |
16
|
Mikko H. Lipasti , Christopher B. Wilkerson , John Paul Shen, Value locality and load value prediction, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.138-147, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
17
|
L. Liu, "History Table for Set Prediction for Accessing a Set-Associate Cache," U.S. Patent 5,418,922, May 1995.
|
 |
18
|
|
| |
19
|
|
 |
20
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
| |
21
|
|
| |
22
|
Timothy J. Slegel , Robert M. Averill III , Mark A. Check , Bruce C. Giamei , Barry W. Krumm , Christopher A. Krygowski , Wen H. Li , John S. Liptay , John D. MacDougall , Thomas J. McPherson , Jennifer A. Navarro , Eric M. Schwarz , Kevin Shum , Charles F. Webb, IBM's S/390 G5 Microprocessor Design, IEEE Micro, v.19 n.2, p.12-23, March 1999
[doi> 10.1109/40.755464]
|
| |
23
|
P. Song, "IBM's Power3 to Replace P2SC," Microprocessor Report, Vol. 11(15), Nov. 1997, pp. 1-11.
|
| |
24
|
|
|