| Less reused filter: improving l2 cache performance via filtering less reused lines |
| Full text |
Pdf
(519 KB)
|
Source
|
International Conference on Supercomputing
archive
Proceedings of the 23rd international conference on Supercomputing
table of contents
Yorktown Heights, NY, USA
SESSION: Cache enhancement techniques
table of contents
Pages 68-79
Year of Publication: 2009
ISBN:978-1-60558-498-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 31, Downloads (12 Months): 87, Citation Count: 0
|
|
|
ABSTRACT
The L2 cache is commonly managed using LRU policy. For workloads that have a working set larger than L2 cache, LRU behaves poorly, resulting in a great number of less reused lines that are never reused or reused for few times. In this case, the cache performance can be improved through retaining a portion of working set in cache for a period long enough. Previous schemes approach this by bypassing never reused lines. Nevertheless, severely constrained by the number of never reused lines, sometimes they deliver no benefit due to the lack of never reused lines. This paper proposes a new filtering mechanism that filters out the less reused lines rather than just never reused lines. The extended scope of bypassing provides more opportunities to fit the working set into cache. This paper also proposes a Less Reused Filter (LRF), a separate structure that precedes L2 cache, to implement the above mechanism. LRF employs a reuse frequency predictor to accurately identify the less reused lines from incoming lines. Meanwhile, based on our observation that most less reused lines have a short life span, LRF places the filtered lines into a small filter buffer to fully utilize them, avoiding extra misses. Our evaluation, for 24 SPEC 2000 benchmarks, shows that augmenting a 512KB LRU-managed L2 cache with a LRF having 32KB filter buffer reduces the average MPKI by 27.5%, narrowing the gap between LRU and OPT by 74.4%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems journal, pages 78--101, 1966.
|
 |
3
|
Haakon Dybdahl , Per Stenström , Lasse Natvig, An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches, Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures, p.45-52, September 16-20, 2006, Seattle, Washington
[doi> 10.1145/1166133.1166139]
|
 |
4
|
|
| |
5
|
|
| |
6
|
W. A. Wong and J.-L. Baer. Modified lru policies for improving second-level cache behavior. In HPCA-6, 2000.
|
| |
7
|
|
 |
8
|
Moinuddin K. Qureshi , Aamer Jaleel , Yale N. Patt , Simon C. Steely , Joel Emer, Adaptive insertion policies for high performance caching, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
9
|
A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In ICS-9, 1995.
|
 |
10
|
|
| |
11
|
|
 |
12
|
Koji Inoue , Tohru Ishihara , Kazuaki Murakami, Way-predicting set-associative cache for high performance and low energy consumption, Proceedings of the 1999 international symposium on Low power electronics and design, p.273-275, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313948]
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
P. Pujara and A. Aggarwal. Increasing the cache efficiency by eliminating noise. In HPCA-12, 2006.
|
| |
17
|
W. Lin and S. Reinhardt. Predicting last-touch references under optimal replacement. Technical Report CSE-TR-447-02, University of Michigan, 2002.
|
 |
18
|
Elizabeth J. O'Neil , Patrick E. O'Neil , Gerhard Weikum, The LRU-K page replacement algorithm for database disk buffering, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.297-306, May 25-28, 1993, Washington, D.C., United States
|
| |
19
|
|
 |
20
|
|
| |
21
|
D. Lee , J. Choi , J. H. Kim , S. H. Noh , S. L. Min , Y. Cho , C. S. Kim, LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies, IEEE Transactions on Computers, v.50 n.12, p.1352-1361, December 2001
[doi> 10.1109/TC.2001.970573]
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
C.-H. Chi and H. Dietz. Improving cache performance by selective cache bypass. System Sciences, 1989. Vol.I: Architecture Track, Proceedings of the Twenty-Second Annual Hawaii International Conference on, 1:277--285 vol.1, 1989.
|
| |
28
|
Youfeng Wu , Ryan Rakvic , Li-Ling Chen , Chyi-Chang Miao , George Chrysos , Jesse Fang, Compiler managed micro-cache bypassing for high performance EPIC processors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
29
|
J. Rivers and E. Davidson. Reducing conflicts in direct-mapped caches with a temporality-based design. In ICPP'96, 1996.
|
 |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
J. Jalminger and P. Stenstrom. A novel approach to cache block reuse predictions. In ICPP'03, 2003.
|
 |
35
|
|
 |
36
|
|
|