| Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues |
| Full text |
Pdf
(155 KB)
|
| Source
|
ACM Transactions on Mathematical Software (TOMS)
archive
Volume 24 , Issue 3 (September 1998)
table of contents
Pages: 303 - 316
Year of Publication: 1998
ISSN:0098-3500
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 46, Citation Count: 6
|
|
ABSTRACT
This companion article discusses portability and optimization issues of the GEMM-based level 3 BLAS model implementations and the performance evaluation benchmark. All software comes in all four data types (single- and double-precision, real and complex) and are designed to be easy to implement and use on different platforms. Each of the GEMM-based routines has a few machine-dependent parameters that specify internal block sizes, cache characteristics, and branch points for alternative code sections. These parameters provide means for adjustment to the characteristics of a memory hierarchy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Anderson , Z. Bai , C. Bischof , J. Demmel , J. Dongarra , J. Du Croz , A. Greenbaum , S. Hammarling , A. McKenney , S. Ostrouchov , D. Sorensen, LAPACK's user's guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992
|
| |
2
|
David H. Bailey, Unfavorable strides in cache memory systems (RNR Technical Report RNR-92-015), Scientific Programming, v.4 n.2, p.53-58, Summer 1995
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
REVIEW
"Timothy R. Hopkins : Reviewer"
The basic linear algebra subroutines (BLAS) consist of
three libraries (known as Levels 1, 2, and 3) and form an integral part
of much of the important numerical software developed over the last two
decades. Efficient implementatio
more...
|