| Automatic benchmark generation for cache optimization of matrix operations |
| Full text |
Pdf
(718 KB)
|
| Source
|
ACM Southeast Regional Conference
archive
Proceedings of the 33rd annual on Southeast regional conference
table of contents
Clemson, South Carolina
SESSION: Algorithms
table of contents
Pages: 195 - 204
Year of Publication: 1995
ISBN:0-89791747-2
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 0, Downloads (12 Months): 19, Citation Count: 2
|
|
|
ABSTRACT
Computationally intensive algorithms must usually be restructured to make the best use of cache memory in current high-performance, hierarchical memory computers. Unfortunately, cache conscious algorithms are sensitive to object sizes and addresses as well as the details of the cache and translation lookaside buffer geometries, and this sensitivity makes both automatic restructuring and hand-turning difficult tasks. An optimization approach is presented in this paper that automatically generates and executes a benchmark program from a concise specification of the algorithm's structure. This technique provides the performance data needed for verification of code generation heuristics or search among the various restructuring options. Matrix transpose and matrix multiplication are examined using this approach for several workstations with restructuring options of loop order, tiling (blocking), and unrolling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
R. Bell, IBM RISC System/6000 Performance Tuning for Numerically Intensive Fortran and C Programs, IBM ITSC Technical Bulletin GG24-3611, October 1990.
|
 |
3
|
Mark Bromley , Steven Heller , Tim McNerney , Guy L. Steele, Jr., Fortran at ten gigaflops: the connection machine convolution compiler, Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, p.145-156, June 24-28, 1991, Toronto, Ontario, Canada
|
 |
4
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
 |
5
|
|
 |
6
|
|
CITED BY 2
|
Jeff Bilmes , Krste Asanovic , Chee-Whye Chin , Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Proceedings of the 11th international conference on Supercomputing, p.340-347, July 07-11, 1997, Vienna, Austria
|
|
|
|
|