|
ABSTRACT
A processor must know a load instruction's latency to schedule the load's dependent instructions at the correct time. Unfortunately, modern processors do not know this latency until well after the dependent instructions should have been scheduled to avoid pipeline bubbles between themselves and the load. One solution to this problem is to predict the load's latency, by predicting whether the load will hit or miss in the data cache. Existing cache hit/miss predictors, however, can only correctly predict about 50% of cache misses.This paper introduces a new hit/miss predictor that uses a Bloom Filter to identify cache misses early in the pipeline. This early identification of cache misses allows the processor to more accurately schedule instructions that are dependent on loads and to more precisely prefetch data into the cache. Simulations using a modified SimpleScalar model show that the proposed Bloom Filter is nearly perfect, with a prediction accuracy greater than 99% for the SPECint2000 benchmarks. IPC (Instructions Per Cycle) performance improved by 19% over a processor that delayed the scheduling of instructions dependent on a load until the load latency was known, and by 6% and 7% over a processor that always predicted a load would hit the cache and with a counter-based hit/miss predictor respectively. This IPC reaches 99.7% of the IPC of a processor with perfect scheduling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Compaq Computer Corporation. Alpha 21264 Microprocessor Hardware Reference Manual, 1999.
|
 |
2
|
|
| |
3
|
D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report #1342, CS Department, Univ. of Wisconsin-Madison, June 1997.
|
| |
4
|
|
 |
5
|
|
| |
6
|
P. Glaskowsky. Pentium 4 (Partially) Previewed. Microprocessor Report, Aug. 2000.
|
| |
7
|
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The Microarchitecture of the Pentium 4 Processor. Intel Technical Journal, Q1 2001.
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
Adi Yoaz , Mattan Erez , Ronny Ronen , Stephan Jourdan, Speculation techniques for improving load related instruction scheduling, Proceedings of the 26th annual international symposium on Computer architecture, p.42-53, May 01-04, 1999, Atlanta, Georgia, United States
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
Pin Zhou , Wei Liu , Long Fei , Shan Lu , Feng Qin , Yuanyuan Zhou , Samuel Midkiff , Josep Torrellas, AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.269-280, December 04-08, 2004, Portland, Oregon
|
|
|
Elham Safi , Andreas Moshovos , Andreas Veneris, L-CBF: a low-power, fast counting bloom filter architecture, Proceedings of the 2006 international symposium on Low power electronics and design, October 04-06, 2006, Tegernsee, Bavaria, Germany
|
|
|
Dong Hyuk Woo , Mrinmoy Ghosh , Emre Özer , Stuart Biles , Hsien-Hsin S. Lee, Reducing energy of virtual cache synonym lookup using bloom filters, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|