|
ABSTRACT
Current memory hierarchies exploit locality of references to reduce load latency and thereby improve processor performance. Locality based schemes aim at reducing the number of cache misses and tend to ignore the nature of misses. This leads to a potential mis-match between load latency requirements and latencies realized using a traditional memory system. To bridge this gap, we partition loads as critical and non-critical. A load that needs to complete early to prevent processor stalls is classified as critical, while a load that can tolerate a long latency is considered non-critical.
In this paper, we investigate if it is worth violating locality to exploit information on criticality to improve processor performance. We present a dynamic critical load classification scheme and show that 40% performance improvements are possible on average, if all critical loads are guaranteed to hit in the Ll cache. We then compare the two properties, locality and criticality, in the context of several cache organization and prefetching schemes. We find that the working set of critical loads is large, and hence practical cache organization schemes based on criticality are unable to reduce the critical load miss ratios enough to produce performance gains. Although criticality-based prefetching can help for some resource constrained programs, its benefit over locality-based prefetching is small and may not be worth the added complexity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. C. Burger, T. M. Austin, and S. Bennett. Evaluating Future Microprocessors-the SimpleScalar Tool Set. Technical Report 1308, Computer Sciences Department, University of Wisconsin-Madison, July 1996.
|
 |
2
|
Brad Calder , Glenn Reinman , Dean M. Tullsen, Selective value prediction, Proceedings of the 26th annual international symposium on Computer architecture, p.64-74, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
Teresa L. Johnson , Matthew C. Merten , Wen-Mei W. Hwu, Run-time spatial locality detection and optimization, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.57-64, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
14
|
|
| |
15
|
J. A. Rivers and E. S. Davidson. Reducing Conflicts in Direct-Mapped Caches with a Temporality-Based Design. In Proceedings of the 1996 International Conference on Parallel Processing, volume 1, pages 154-163, August 1996.
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
Gary Tyson , Matthew Farrens , John Matthews , Andrew R. Pleszkun, A modified approach to data cache management, Proceedings of the 28th annual international symposium on Microarchitecture, p.93-103, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
 |
21
|
|
CITED BY 18
|
|
|
|
|
Youfeng Wu , Ryan Rakvic , Li-Ling Chen , Chyi-Chang Miao , George Chrysos , Jesse Fang, Compiler managed micro-cache bypassing for high performance EPIC processors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anahita Shayesteh , Glenn Reinman , Norm Jouppi , Tim Sherwood , Suleyman Sair, Improving the performance and power efficiency of shared helpers in CMPs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
Kartik K. Agaram , Stephen W. Keckler , Calvin Lin , Kathryn S. McKinley, Decomposing memory performance: data structures and phases, Proceedings of the 2006 international symposium on Memory management, June 10-11, 2006, Ottawa, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|