ACM Home Page
Please provide us with feedback. Feedback
Architectural and compiler support for effective instruction prefetching: a cooperative approach
Full text PdfPdf (433 KB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 19 ,  Issue 1  (February 2001) table of contents
Pages: 71 - 109  
Year of Publication: 2001
ISSN:0734-2071
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 53,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   review   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/367742.367786
What is a DOI?

ABSTRACT

Instruction cache miss latency is becoming an increasingly important performance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for tolerating this latency, we find that existing prefetching schemes are insufficient for modern superscalar processors, since they fail to issue prefetches early enough (particularly for nonsequential accesses). To overcome these limitations, we propose a new instruction prefetching technique whereby the hardware and software cooperate to hide the latency as follows. The hardware performs aggressive sequential prefetching combined with a novel prefetch filtering mechanism to allow it to get far ahead without polluting the cache. To hide the latency of nonsequential accesses, we propose and implement a novel compiler algorithm which automatically inserts instruction-prefetch the targets of control transfers far enough in advance. Our experimental results demonstrate that this new approach hides 50% or more tof the latecy remaining with the best previous techniques, while at the same time reduces the number of useless prefetches by a factor of six. We find that both the prefetch filtering and compiler-inserted prefetching components of our design are essential and complementary, and that the compiler can limit the code expansion to only 9% on average. In addition, we show that the performance of our technique can be further increased by using profiling information to help reduce cache conflicts and unnecessary prefetches. From an architectural perspective, these performance advantages are sustained over a range of common miss latencies and bandwidth. Finally, our technique is cost effective as well, since it delivers performance comparable to (or even better than) that of larger caches, but requires a much smaller hardware budget.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
5
 
6
 
7
8
9
10
11
12
 
13
14
15
 
16
17
 
18
 
19
20
21
 
22
SMITH, A. 1978. Sequential program prefetching in memory hierarchies. IEEE Computer 11, 2, 7-21.
23
 
24
 
25
 
26
WEBB, C. F. 1988. Subroutine call/Return stack. IBM Tech. Discl. Bull. 30, 11 (Apr.).
27
 
28
 
29
YU, A. AND CHEN, J. 1996. The Postgres95 User Manual v1.0. University of California at Berkeley, Berkeley, CA.



REVIEW

"Olivier Louis Marie Lecarme : Reviewer"

Although this somewhat long paper was published in ACM Transactions on Computer Systems, it could have been published in ACM Transactions on Programming Languages and Systems as well. In fact, a joint publication would have been bett  more...


Peer to Peer - Readers of this Article have also read: