|
ABSTRACT
The increasing gap in performance between processors and main memory has made effective instructions prefetching techniques more important than ever. A major deficiency of existing prefetching methods is that most of them require an extra port to I-cache. A recent study by [19] shows that this factor alone explains why most modern microprocessors do not use such hardware-based I-cache prefetch schemes. The contribution of this paper is two-fold. First we present a method that does not require an extra port to I-cache. Second, the performance improvement for our method is greater than the best competing method [23] even disregarding the improvement from not having an extra port.The three key features of our method that prevent the above deficiencies are as follows. First, too-late prefetching is prevented by correlating misses to dynamically preceding instructions. For example, if the I-cache miss latency is 12 cycles, then the instruction that was fetched 12 cycles prior to the miss is used as the prefetch trigger. Second, the miss history table is kept to a reasonable size by grouping contiguous cache misses together and associated them with one preceding instruction, and therefore, one table entry. Third, the extra I-cache port is avoided through efficient prefetch filtering methods. Experiments show that for our benchmarks, chosen for their poor I-cache performance, an average improvement of 9.2% in runtime is achieved versus the BHGP methods [23], while the hardware cost is also reduced. The improvement will be greater if the runtime impact of avoiding an extra port is considered.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Alpha Architecture Handbook. Digital Equipment Corporation, Maynard, MA, 1994
|
| |
3
|
D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report TR 1342, University of Wisconsin, Madison, WI, June 1997
|
| |
4
|
S. P. E. Corporation. The SPEC benchmark suites. http://www.spec.org
|
 |
5
|
Ann Marie Grizzaffi Maynard , Colette M. Donnelly , Bret R. Olszewski, Contrasting characteristics and cache performance of technical and multi-user commercial workloads, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.145-156, October 05-07, 1994, San Jose, California, United States
|
| |
6
|
|
 |
7
|
Dana S. Henry , Bradley C. Kuszmaul , Gabriel H. Loh , Rahul Sami, Circuits for wide-window superscalar processors, Proceedings of the 27th annual international symposium on Computer architecture, p.236-247, June 2000, Vancouver, British Columbia, Canada
|
| |
8
|
|
| |
9
|
Intel IA-64 Architecture Software Developer's Manual, Volumes I-IV. Intel Corporation, January 2000. Also available at http://developer.intel.com
|
| |
10
|
Intel(R) Itanium(TM) Processor Hardware Developer's Manual. Intel Corporation, August 2001
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Yale N. Patt , Sanjay J. Patel , Marius Evers , Daniel H. Friendly , Jared Stark, One Billion Transistors, One Uniprocessor, One Chip, Computer, v.30 n.9, p.51-57, September 1997
[doi> 10.1109/2.612249]
|
| |
15
|
|
| |
16
|
IBM Regains Performance Lead with Power2. Microprocessor Report, October 1993
|
| |
17
|
PowerPC 740/PowerPC 750 RISC Microprocessor User's Manual. IBM Corporation, 1999
|
| |
18
|
|
| |
19
|
Jude A. Rivers , Gary S. Tyson , Edward S. Davidson , Todd M. Austin, On high-bandwidth data cache design for multi-issue processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.46-56, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
20
|
International Technology Roadmap for Semiconductors, 1998 Update. Semiconductor Industry Association, page 4, 1998
|
| |
21
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Improving prediction for procedure returns with return-address-stack repair mechanisms, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.259-271, November 1998, Dallas, Texas, United States
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
K. Yeager and et. al. Superscalar Microprocessor. Hot Chips VII, 1995
|
CITED BY 2
|
|
|
|
|
Chanik Park , Jaeyu Seo , Sunghwan Bae , Hyojun Kim , Shinhan Kim , Bumsoo Kim, A low-cost memory architecture with NAND XIP for mobile embedded systems, Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, October 01-03, 2003, Newport Beach, CA, USA
|
|