ACM Home Page
Please provide us with feedback. Feedback
Load squared: adding logic close to memory to reduce the latency of indirect loads with high miss ratios
Full text PdfPdf (288 KB)
Source ACM SIGARCH Computer Architecture News archive
Volume 33 ,  Issue 3  (June 2005) table of contents
Special issue: MEDEA 2004 workshop
SPECIAL ISSUE: MEDEA 2004 workshop table of contents
Pages: 17 - 24  
Year of Publication: 2005
ISSN:0163-5964
Also published in ...
Authors
Sami Yehia  ARM Ltd, Cambridge, UK
Jean-Francois Collard  Hewlett-Packard Labs, Palo Alto CA
Olivier Temam  University of Paris-Sud, France
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 14,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1101868.1101873
What is a DOI?

ABSTRACT

Indirect memory accesses, where a load is fed by another load, are ubiquitous because of rich data structures and sophisticated software conventions, such as the use of linkage tables and position independent code. Unfortunately, they can be costly: if both loads miss, two round trips to memory are required even though the role of the first load is often limited to fetching the address of the second load. To reduce the total latency of such indirect accesses, a new instruction called load squared is introduced. A load squared does two fetches, the first fetch reading the target address of the second. (An offset is optionally added to the result of the first fetch.) The load squared operation is performed by memory-side logic (typically, the memory controller if it isn't located on the main processor chip). In this study, load squared is not an architecturally visible instruction: the micro-architecture transparently decides which loads should be replaced by loads squared. We show that performance is sometimes improved significantly, and never degraded.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
D. Burger, T. M. Austin, and S. Bennett. Evaluating future microprocessors: The SimpleScalar tool set. Technical Report CS-TR-1996-1308, 1996.
 
4
 
5
6
 
7
 
8
Intel Corp. Intel Itanium 2 Processor Reference Manual.
 
9
M. Karlsson, F. Dahlgren, and P. Stenstrom. A prefetching technique for irregular accesses to linked data structures. In Proc. 6th Int'l Symp. on High-Perf. Comp. Arch. (HPCA'6), pages 206--217, 2000.
 
10
11
 
12
S. McFarling. Combining branch predictors. Technical Note TN-36, Digital WRL, june 1993.
13
14
15
 
16
17
18
19

Collaborative Colleagues:
Sami Yehia: colleagues
Jean-Francois Collard: colleagues
Olivier Temam: colleagues