| Load squared: adding logic close to memory to reduce the latency of indirect loads with high miss ratios |
| Full text |
Pdf
(288 KB)
|
| Source
|
ACM SIGARCH Computer Architecture News
archive
Volume 33 , Issue 3 (June 2005)
table of contents
Special issue: MEDEA 2004 workshop
SPECIAL ISSUE: MEDEA 2004 workshop
table of contents
Pages: 17 - 24
Year of Publication: 2005
ISSN:0163-5964
Also published in ...
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 15, Citation Count: 0
|
|
|
ABSTRACT
Indirect memory accesses, where a load is fed by another load, are ubiquitous because of rich data structures and sophisticated software conventions, such as the use of linkage tables and position independent code. Unfortunately, they can be costly: if both loads miss, two round trips to memory are required even though the role of the first load is often limited to fetching the address of the second load. To reduce the total latency of such indirect accesses, a new instruction called load squared is introduced. A load squared does two fetches, the first fetch reading the target address of the second. (An offset is optionally added to the result of the first fetch.) The load squared operation is performed by memory-side logic (typically, the memory controller if it isn't located on the main processor chip). In this study, load squared is not an architecturally visible instruction: the micro-architecture transparently decides which loads should be replaced by loads squared. We show that performance is sometimes improved significantly, and never degraded.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Michael Bekerman , Stephan Jourdan , Ronny Ronen , Gilad Kirshenboim , Lihu Rappoport , Adi Yoaz , Uri Weiser, Correlated load-address predictors, Proceedings of the 26th annual international symposium on Computer architecture, p.54-63, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
3
|
D. Burger, T. M. Austin, and S. Bennett. Evaluating future microprocessors: The SimpleScalar tool set. Technical Report CS-TR-1996-1308, 1996.
|
| |
4
|
J. Carter , W. Hsieh , L. Stoller , M. Swanson , L. Zhang , E. Brunvand , A. Davis , C.-C. Kuo , R. Kuramkote , M. Parker , L. Schaelicke , T. Tateyama, Impulse: Building a Smarter Memory Controller, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.70, January 09-12, 1999
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
Intel Corp. Intel Itanium 2 Processor Reference Manual.
|
| |
9
|
M. Karlsson, F. Dahlgren, and P. Stenstrom. A prefetching technique for irregular accesses to linked data structures. In Proc. 6th Int'l Symp. on High-Perf. Comp. Arch. (HPCA'6), pages 206--217, 2000.
|
| |
10
|
Mikko H. Lipasti , William J. Schmidt , Steven R. Kunkel , Robert R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, Proceedings of the 28th annual international symposium on Microarchitecture, p.231-236, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
 |
11
|
|
| |
12
|
S. McFarling. Combining branch predictors. Technical Note TN-36, Digital WRL, june 1993.
|
 |
13
|
|
 |
14
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
|