ACM Home Page
Please provide us with feedback. Feedback
Data prefetching by dependence graph precomputation
Full text PdfPdf (909 KB)
Source International Symposium on Computer Architecture archive
Proceedings of the 28th annual international symposium on Computer architecture table of contents
Göteborg, Sweden
Pages: 52 - 61  
Year of Publication: 2001
ISBN:0-7695-1162-7
Also published in ...
Authors
Murali Annavaram  Electrical Engineering and Computer Science Department, The University of Michigan, Ann Arbor
Jignesh M. Patel  Electrical Engineering and Computer Science Department, The University of Michigan, Ann Arbor
Edward S. Davidson  Electrical Engineering and Computer Science Department, The University of Michigan, Ann Arbor
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\TCCA : TC on Computer Arhitecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 36,   Citation Count: 34
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/379240.379251
What is a DOI?

ABSTRACT

Data cache misses reduce the performance of wide-issue processors by stalling the data supply to the processor. Prefetching data by predicting the miss address is one way to tolerate the cache miss latencies. But current applications with irregular access patterns make it difficult to accurately predict the address sufficiently early to mask large cache miss latencies. This paper explores an alternative to predicting prefetch addresses, namely precomputing them. The Dependence Graph Precomputation scheme (DGP) introduced in this paper is a novel approach for dynamically identifying and precomputing the instructions that determine the addresses accessed by those load/store instructions marked as being responsible for most data cache misses. DGP's dependence graph generator efficiently generates the required dependence graphs at run time. A separate precomputation engine executes these graphs to generate the data addresses of the marked load/store instructions early enough for accurate prefetching. Our results show that 94% of the prefetches issued by DGP are useful, reducing the D-cache miss stall time by 47%. Thus DGP takes us about half way from an already highly tuned baseline system toward perfect D-cache performance. DGP improves the overall performance of a wide range of applications by 7% over tagged next line prefetching, by 13% over a baseline processor with no prefetching, and is within 15% of the perfect D-cache performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Annavaram, G. Tyson, and E. Davidson. Instruction Overhead and Data Locality Effects in Superscalar Processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, pages 95-100, April 2000.
 
2
 
3
D. Burger and T. Austin. The SimpleScalar Tool Set. Technical report, University of Wisconsin-Madison, Computer ScienceDepartment Technical Report #1342, June 1997.
4
5
6
 
7
T. P. P. Council. TPC Benchmark H Standard Specification (Decision Support). In Revision 1.1.0, June 1999.
 
8
 
9
 
10
 
11
 
12
A. Srivastava and D. Wall. A Practical System for Intermodule Code Optimization at Link-Time. Technical Report Technical Report 92/6, Digital Western Research Laboratory, June 1992.
 
13
M. Weiser. Program Slicing. IEEE Transactions on Software Engineering, 11(4):352-357, 1984.
14

CITED BY  34

Collaborative Colleagues:
Murali Annavaram: colleagues
Jignesh M. Patel: colleagues
Edward S. Davidson: colleagues