| Skewed associativity enhances performance predictability |
| Full text |
Pdf
(978 KB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 22nd annual international symposium on Computer architecture
table of contents
S. Margherita Ligure, Italy
Pages: 265 - 274
Year of Publication: 1995
ISBN:0-89791-698-0
Also published in ...
|
|
Authors
|
|
François Bodin
|
IRISA-INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France
|
|
André Seznec
|
IRISA-INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 52, Citation Count: 6
|
|
|
ABSTRACT
Performance tuning becomes harder as computer technology advances. One of the factors is the increasing complexity of memory hierarchies. Most modern machines now use at least one level of cache memory. To reduce execution stalls, cache misses must be very low. Software techniques used to improve locality have been developped for numerical codes, such as loop blocking and copying. Unfortunately, the behavior of direct mapped and set associative caches is still erratic when large numerical data is accessed. Execution time can vary drasticly for the same loop kernel depending on uncontrolled factors such as array leading size. The only software method available to improve execution time stability is the copying of frequently used data, which is costly in execution time. Users are not usually cache organisation experts. They are not aware of such phenomena, and have no control over it.In this paper, we show that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms. As a result, execution time is faster and much more predictable than with conventional caches. As a result of its better comportment, it is possible to use larger blocks sizes with blocked algorithms, which will furthermore reduces blocking overhead costs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Bernard, F. Bodin, A. Goasguen, C. Fechant, "Implementing a two dimensional pore-scale flow model on different parallel machines", Proceedings of X international Conference on Computational Methods in Water Resources, June 1994.
|
| |
2
|
F. Bodin, C. Eisenbeis, W. Jalby, D. Windheiser, "A quantitative algorithm for data locality optimization" in Code Generation-Concepts, Tools, Techniques, Springer Verlag, 1992.
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
G.Irlam "Spa" personal communication 1992; the Spa package is available from gordoni@cs.adelaide.edu.au
|
 |
7
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
| |
8
|
A. Porterfield, "Compiler management of program locality", Technical Report, Rice University, Houston, Texas, January 1988.
|
| |
9
|
M. Schlansker, R. Shaw, A. Siw~ramakrishnan "Randomization and Associativity in the Design of Placement-Insensitive Caches" HP Laboratories Technical Report 93-41, June. 1993
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
M. Wolf, M. Lain, "An algorithm to generate sequential and parallel code with improved data localityD", Technical Report, Stanford University 1990.
|
 |
14
|
|
|