|
ABSTRACT
L1 caches must be fast and have a good hit rate at the same time. To be fast, they must remain small. To have a good hit rate, they must be set-associative. With wider associativity the replacement algorithm becomes critical. The wide performance gap between OPT, the optimum off-line algorithm, and LRU suggests that LRU still makes too many mistakes. One way to improve L1 cache behavior is to manage actively the replacement policy to correct these mistakes on the fly.We introduce Self-correcting LRU (SCLRU) which is based on LRU augmented with a feedback loop to constantly monitor and correct replacement mistakes. It relies on several mechanisms to detect, predict, and correct bad replacement decisions. We identify three types of mistakes made by LRU and associate them with memory-access instructions. Our goal is to prevent a mistake to occur more than once. Based on evaluations using a set of seven SPEC95 benchmarks, we show that our approach achieves significant and reliable miss rate improvements, sometimes close to that of OPT, for 2-way and 4-way L1 caches and can do this at a low implementation cost and without affecting the hit cycle time.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
L. Belady. A Study of Replacement Algorithms for a Virtual-Storage Computer. In IBM Systems Journal, v. 5, no. 2, pp. 78--101, 1966.
|
| |
3
|
D. Burger, Doug and T. M. Austin,. The SimpleScalar Tool Set, Version 2. University of Wisconsin-Madison Computer Science Department, TN-1342, 1997.
|
 |
4
|
|
| |
5
|
J. Jalminger and P. Stenstrom. A Novel Approach to Cache Reuse Prediction. In Proceedings of 32nd International Parallel Processing Conference (ICPP-2003). Oct. 2003.
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
M. Karlsson and E. Hagersten, "Timestamp-based Selective Cache Allocation", High Performance Memory Systems, edited by H. Hadimiouglu, D. Kaeli, J. Kuskin, A. Nanda, and J. Torrellas, Springer-Verlag, 2003.
|
| |
10
|
|
 |
11
|
|
| |
12
|
R. L. Mattson, J. Gecsei, D. R. Slutz and I. L. Traiger. Evaluation Techniques for Storage Hierarchies. In IBM Systems Journal, vol. 9, pp. 77--117, 1970.
|
| |
13
|
V. Milutinovic, B. Markovic, M. Tomasevic, and M. Tremblay. The Split Temporal/Spatial Cache: Initial Performance analysis. In Proceedings of SCIzzL-5, pp. 63--70, March 1996.
|
| |
14
|
M. Prvulovic, D. Marinov, Z. Dimitrijevic, and V. Milutinovic. Split Temporal/Spatial Cache: A Survey and Reevaluation of Performance, IEEE TCCA Newsletter, pp. 1--10, 1999.
|
 |
15
|
|
| |
16
|
Standard Performance Evaluation Corporation. http://www.spec.org. SPEC 95. August 1995.
|
| |
17
|
H. S. Stone. High Performance Computer Architecture. Addison-Wesley 1993 (3rd ed.).
|
| |
18
|
|
| |
19
|
Gary Tyson , Matthew Farrens , John Matthews , Andrew R. Pleszkun, A modified approach to data cache management, Proceedings of the 28th annual international symposium on Microarchitecture, p.93-103, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
20
|
Wayne A. Wong and Jean-Loup Baer. Modified LRU Policies for Improving Second-level Cache Behavior. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, pp. 49--60, January 2000.
|
CITED BY 3
|
|
Haakon Dybdahl , Per Stenström , Lasse Natvig, An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches, Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures, p.45-52, September 16-20, 2006, Seattle, Washington
|
|
|
|
|
|
|
|