ACM Home Page
Please provide us with feedback. Feedback
Direct load: dependence-linked dataflow resolution of load address and cache coordinate
Full text Publisher SitePublisher Site PdfPdf (1.38 MB)
Source International Symposium on Microarchitecture archive
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture table of contents
Austin, Texas
SESSION: Memory hierarchies table of contents
Pages: 76 - 87  
Year of Publication: 2001
ISBN ~ ISSN:1072-4451 , 0-7695-1369-7
Authors
Byung-Kwon Chung  Sun Microsystems
Jinsuo Zhang  University of Florida
Jih-Kwon Peir  University of Florida
Shih-Chang Lai  Oregon State University
Konrad Lai  Intel Corp.
Sponsors
: IEEE TC-MARCH
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 12,   Citation Count: 3
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

An increasing cache latency in future processors incurs profound performance impacts in spite of advanced out-of-order execution techniques. In this paper, we describe an early address resolution mechanism that accurately resolves both regular and irregular load addresses. The basic idea is to build dynamic dependence links from the instruction that updates the base register to the consumer load instructions. Once a new base address is available, it triggers calculations of the new load addresses for dependent loads. Furthermore, the exact cache location of the requested data is predicted based on the newly resolved load address. As a result, this direct load can access the data cache directly to achieve a zero-cycle load latency. Performance evaluation using SPEC integer programs shows that the dynamic dependence links can be established accurately. Combined with stride-based predictor, the proposed early address resolution achieves about 97% average accuracy with less than 1% misprediction. Based on a modified SimpleScalar model, the proposed method can potentially improve IPC by about 18%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
P. Ahuja, J. Emer, A. Klauser, and S. Mukherjee, "Performance Potential of Effective Address Prediction of Load Instructions," Prod. of 2001 Workshop on Memory Performance Issues, July 2001, (12 pages).
 
2
3
4
 
5
D. Burger and T. Austin, "The SimpleScalar Tool Set, Version 2.0", Technical Report #1342, CS Department, Univ. of Wisconsin-Madison, June 1997.
6
 
7
 
8
9
 
10
R. Eickemeyer and S. Vassiliadis, "A Load-Instruction Unit For Pipelined Processors," IBM Journal of Research and Development, Vol. 37(4), pp. 547-564, July 1993.
 
11
12
 
13
 
14
K. Hua, A, Hunt, L. Liu, J-K. Peir, D. Pruett, and J. Temple, "Early Resolution of Address Translation in Cache Design," Proc. of 1990 Int'l Conf. on Computer Design, Boston, MA, Sep. 1990, pp. 408-412.
 
15
16
 
17
L. Liu, "History Table for Set Prediction for Accessing a Set-Associate Cache," U.S. Patent 5,418,922, May 1995.
18
 
19
20
 
21
 
22
 
23
P. Song, "IBM's Power3 to Replace P2SC," Microprocessor Report, Vol. 11(15), Nov. 1997, pp. 1-11.
 
24

Collaborative Colleagues:
Byung-Kwon Chung: colleagues
Jinsuo Zhang: colleagues
Jih-Kwon Peir: colleagues
Shih-Chang Lai: colleagues
Konrad Lai: colleagues