ACM Home Page
Please provide us with feedback. Feedback
Execution history guided instruction prefetching
Full text PdfPdf (218 KB)
Source International Conference on Supercomputing archive
Proceedings of the 16th international conference on Supercomputing table of contents
New York, New York, USA
SESSION: Memory-wall table of contents
Pages: 199 - 208  
Year of Publication: 2002
ISBN:1-58113-483-5
Authors
Yi Zhang  University of Maryland, College Park, MD
Steve Haga  University of Maryland, College Park, MD
Rajeev Barua  University of Maryland, College Park, MD
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 22,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/514191.514220
What is a DOI?

ABSTRACT

The increasing gap in performance between processors and main memory has made effective instructions prefetching techniques more important than ever. A major deficiency of existing prefetching methods is that most of them require an extra port to I-cache. A recent study by [19] shows that this factor alone explains why most modern microprocessors do not use such hardware-based I-cache prefetch schemes. The contribution of this paper is two-fold. First we present a method that does not require an extra port to I-cache. Second, the performance improvement for our method is greater than the best competing method [23] even disregarding the improvement from not having an extra port.The three key features of our method that prevent the above deficiencies are as follows. First, too-late prefetching is prevented by correlating misses to dynamically preceding instructions. For example, if the I-cache miss latency is 12 cycles, then the instruction that was fetched 12 cycles prior to the miss is used as the prefetch trigger. Second, the miss history table is kept to a reasonable size by grouping contiguous cache misses together and associated them with one preceding instruction, and therefore, one table entry. Third, the extra I-cache port is avoided through efficient prefetch filtering methods. Experiments show that for our benchmarks, chosen for their poor I-cache performance, an average improvement of 9.2% in runtime is achieved versus the BHGP methods [23], while the hardware cost is also reduced. The improvement will be greater if the runtime impact of avoiding an extra port is considered.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Alpha Architecture Handbook. Digital Equipment Corporation, Maynard, MA, 1994
 
3
D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report TR 1342, University of Wisconsin, Madison, WI, June 1997
 
4
S. P. E. Corporation. The SPEC benchmark suites. http://www.spec.org
5
 
6
7
 
8
 
9
Intel IA-64 Architecture Software Developer's Manual, Volumes I-IV. Intel Corporation, January 2000. Also available at http://developer.intel.com
 
10
Intel(R) Itanium(TM) Processor Hardware Developer's Manual. Intel Corporation, August 2001
 
11
 
12
 
13
 
14
 
15
 
16
IBM Regains Performance Lead with Power2. Microprocessor Report, October 1993
 
17
PowerPC 740/PowerPC 750 RISC Microprocessor User's Manual. IBM Corporation, 1999
 
18
 
19
 
20
International Technology Roadmap for Semiconductors, 1998 Update. Semiconductor Industry Association, page 4, 1998
 
21
22
 
23
 
24
 
25
K. Yeager and et. al. Superscalar Microprocessor. Hot Chips VII, 1995


Collaborative Colleagues:
Yi Zhang: colleagues
Steve Haga: colleagues
Rajeev Barua: colleagues