ACM Home Page
Please provide us with feedback. Feedback
Dynamic memory instruction bypassing
Full text PdfPdf (216 KB)
Source International Conference on Supercomputing archive
Proceedings of the 17th annual international conference on Supercomputing table of contents
San Francisco, CA, USA
SESSION: Speculative execution table of contents
Pages: 316 - 325  
Year of Publication: 2003
ISBN:1-58113-733-8
Authors
Daniel Ortega  Universidad Politécnica de Cataluña, Barcelona, Spain
Eduard Ayguadé  Universidad Politécnica de Cataluña, Barcelona, Spain
Mateo Valero  Universidad Politécnica de Cataluña, Barcelona, Spain
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 23,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/782814.782858
What is a DOI?

ABSTRACT

Reducing the latency of load instructions is among the most crucial aspects to achieve performance for current and future microarchitectures. Deep pipelining makes L1 caches appear farther than 1 cycle, thus impacting load-to-use latency, even if these instructions hit in cache. In this paper we present a novel dynamic mechanism aimed at overcoming load-to-use latency. Our mechanism dynamically detects relations between address producing instructions. and the loads that consume these addresses and uses this information to access data before the load is even fetched from the I-Cache. We modify the renaming stage so that when these loads are fetched, they are detected and consequently squashed, since their work has already taken place. By fetching data ahead of time, our mechanism allows the microarchitecture to see further in the future, a concept akin to having a bigger reorder buffer. This mechanism is not intended to prefetch from outside the chip (main memory or L3 cache if present). Its main aim is to move data from L1 and L2 silently and ahead of time into the register file so that the load instruction can be subsequently bypassed (hence the name). This mechanisms benefits increase in the presence of memory prefetching or a good memory behaviour, since these scenarios allow for the bypassing of more loadsJ. Besides, a better use of renaming registers allows our mechanism to outperform the baseline even when the latter has more renaming registers. An average performance improvement of 24.5% is achieved in the SPECint95 benchmarks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
5
6
 
7
 
8
 
9
 
10
 
11

Collaborative Colleagues:
Daniel Ortega: colleagues
Eduard Ayguadé: colleagues
Mateo Valero: colleagues