| Increasing temporal locality with skewing and recursive blocking |
| Full text |
Pdf
(286 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM)
table of contents
Denver, Colorado
Pages: 43 - 43
Year of Publication: 2001
ISBN:1-58113-293-X
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 22, Citation Count: 10
|
|
|
ABSTRACT
We present a strategy, called recursive prismatic time skewing, that increase temporal reuse at all memory hierarchy levels, thus improving the performance of scientific codes that use iterative methods. Prismatic time skewing partitions iteration space of multiple loops into skewed prisms with both spatial and temporal (or convergence) dimensions. Novel aspects of this work include: multi-dimensional loop skewing; handling carried data dependences in the skewed loops without additional storage; bi-directional skewing to accommodate periodic boundary conditions; and an analysis and transformation strategy that works inter-procedurally. We combine prismatic skewing with a recursive blocking strategy to boost reuse at all levels in a memory hierarchy. A preliminary evaluation of these techniques shows significant performance improvements compared both to original codes and to methods described previously in the literature. With an inter-procedural application of our techniques, we were able to reduce total primary cache misses of a large application code by 27% and secondary cache misses by 119%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Vikram Adve , Guohua Jin , John Mellor-Crummey , Qing Yi, High performance Fortran compilation techniques for parallelizing scientific codes, Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), p.1-23, November 07-13, 1998, San Jose, CA
|
 |
2
|
|
 |
3
|
Nawaaz Ahmed , Nikolay Mateev , Keshav Pingali, Synthesizing transformations for locality enhancement of imperfectly-nested loop nests, Proceedings of the 14th international conference on Supercomputing, p.141-152, May 08-11, 2000, Santa Fe, New Mexico, United States
[doi> 10.1145/335231.335245]
|
| |
4
|
Nawaaz Ahmed , Nikolay Mateev , Keshav Pingali, Tiling imperfectly-nested loop nests, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.31-es, November 04-10, 2000, Dallas, Texas, United States
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
 |
12
|
Mary W. Hall , Ken Kennedy , Kathryn S. McKinley, Interprocedural transformations for parallel code generation, Proceedings of the 1991 ACM/IEEE conference on Supercomputing, p.424-434, November 18-22, 1991, Albuquerque, New Mexico, United States
[doi> 10.1145/125826.126055]
|
 |
13
|
Induprakas Kodukula , Nawaaz Ahmed , Keshav Pingali, Data-centric multi-level blocking, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.346-357, June 16-18, 1997, Las Vegas, Nevada, United States
|
 |
14
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
 |
15
|
|
| |
16
|
|
| |
17
|
H. Prokop. Cache-oblivious algorithms. Master's thesis, Department of Electrical Engineering, MIT, June 1999.
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
D. Wonnacott. Time skewing: A value-based approach to optimizing for memory locality. Submitted for publication.
|
 |
23
|
Qing Yi , Vikram Adve , Ken Kennedy, Transforming loops to recursion for multi-level memory hierarchies, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.169-181, June 18-21, 2000, Vancouver, British Columbia, Canada
|
CITED BY 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, The potential of the cell processor for scientific computing, Proceedings of the 3rd conference on Computing frontiers, May 03-05, 2006, Ischia, Italy
|
|
|
|
|
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, Scientific computing Kernels on the cell processor, International Journal of Parallel Programming, v.35 n.3, p.263-298, June 2007
|
|
|
|
|