ACM Home Page
Please provide us with feedback. Feedback
Bloom filtering cache misses for accurate data speculation and prefetching
Full text PdfPdf (249 KB)
Source International Conference on Supercomputing archive
Proceedings of the 16th international conference on Supercomputing table of contents
New York, New York, USA
SESSION: Memory-wall table of contents
Pages: 189 - 198  
Year of Publication: 2002
ISBN:1-58113-483-5
Authors
Jih-Kwon Peir  University of Florida
Shih-Chang Lai  Oregon State University
Shih-Lien Lu  Microprocessor Research, Intel Labs
Jared Stark  Microprocessor Research, Intel Labs
Konrad Lai  Microprocessor Research, Intel Labs
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 53,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/514191.514219
What is a DOI?

ABSTRACT

A processor must know a load instruction's latency to schedule the load's dependent instructions at the correct time. Unfortunately, modern processors do not know this latency until well after the dependent instructions should have been scheduled to avoid pipeline bubbles between themselves and the load. One solution to this problem is to predict the load's latency, by predicting whether the load will hit or miss in the data cache. Existing cache hit/miss predictors, however, can only correctly predict about 50% of cache misses.This paper introduces a new hit/miss predictor that uses a Bloom Filter to identify cache misses early in the pipeline. This early identification of cache misses allows the processor to more accurately schedule instructions that are dependent on loads and to more precisely prefetch data into the cache. Simulations using a modified SimpleScalar model show that the proposed Bloom Filter is nearly perfect, with a prediction accuracy greater than 99% for the SPECint2000 benchmarks. IPC (Instructions Per Cycle) performance improved by 19% over a processor that delayed the scheduling of instructions dependent on a load until the load latency was known, and by 6% and 7% over a processor that always predicted a load would hit the cache and with a counter-based hit/miss predictor respectively. This IPC reaches 99.7% of the IPC of a processor with perfect scheduling.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Compaq Computer Corporation. Alpha 21264 Microprocessor Hardware Reference Manual, 1999.
2
 
3
D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report #1342, CS Department, Univ. of Wisconsin-Madison, June 1997.
 
4
5
 
6
P. Glaskowsky. Pentium 4 (Partially) Previewed. Microprocessor Report, Aug. 2000.
 
7
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The Microarchitecture of the Pentium 4 Processor. Intel Technical Journal, Q1 2001.
8
 
9
10
 
11
 
12
 
13
 
14
15

CITED BY  12

Collaborative Colleagues:
Jih-Kwon Peir: colleagues
Shih-Chang Lai: colleagues
Shih-Lien Lu: colleagues
Jared Stark: colleagues
Konrad Lai: colleagues