| A case for a working-set-based memory hierarchy |
| Full text |
Pdf
(161 KB)
|
| Source
|
Conference On Computing Frontiers
archive
Proceedings of the 2nd conference on Computing frontiers
table of contents
Ischia, Italy
SESSION: Track 5: supercomputing (part 2)
table of contents
Pages: 252 - 261
Year of Publication: 2005
ISBN:1-59593-019-1
|
|
Authors
|
|
Steve Carr
|
Michigan Technological University, Houghton, MI
|
|
Soner Önder
|
Michigan Technological University, Houghton, MI
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 17, Citation Count: 0
|
|
|
ABSTRACT
Modern microprocessor designs continue to obtain impressive performance gains through increasing clock rates and advances in the parallelism obtained via micro-architecture design. Unfortunately, corresponding improvements in memory design technology have not been realized, resulting in latencies of over 100 cycles between processors and main memory. This ever-increasing gap in speed has pushed the current memory-hierarchy approach to its limit.Traditional approaches to memory-hierarchy management have not yielded satisfactory results. Hardware solutions require more power and energy than desired and do not scale well. Compiler solutions tend to miss too many optimization opportunities because of limited compile-time knowledge of run-time behavior. This paper explores a different approach that combines both approaches by making use of the static knowledge obtained by the compiler in the dynamic decision making of the micro-architecture. We propose a memory-hierarchy design based on working sets that uses compile-time annotations regarding the working set of memory operations to guide cache placement decisionsOur experiments show that a working-set-based memory hierarchy can significantly reduce the miss rate for memory-intensive tiled kernels by limiting cross interference. The working-set-based memory hierarchy allows the compiler to tile many loops without concern for cross interference in the cache, making tile size choice easier. In addition, the compiler can more easily tailor tile choices to the separate needs of different working sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
 |
9
|
Somnath Ghosh , Margaret Martonosi , Sharad Malik, Precise miss analysis for program transformations with caches of arbitrary associativity, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.228-239, October 02-07, 1998, San Jose, California, United States
|
| |
10
|
M. Kandemir , A. Choudhary , J. Ramanujam , P. Banerjee, Improving locality using loop and data transformations in an integrated framework, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.285-297, November 1998, Dallas, Texas, United States
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
Nigel Topham , Antonio González , José González, The design and performance of a conflict-avoiding cache, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.71-80, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
|