ACM Home Page
Please provide us with feedback. Feedback
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
Full text PdfPdf (487 KB)
Source ACM Transactions on Mathematical Software (TOMS) archive
Volume 24 ,  Issue 3  (September 1998) table of contents
Pages: 268 - 302  
Year of Publication: 1998
ISSN:0098-3500
Authors
Bo Kågström  Umeå Univ., Umeå, Sweden
Per Ling  Umeå Univ., Umeå, Sweden
Charles van Loan  Cornell Univ., Ithaca, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 90,   Citation Count: 16
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/292395.292412
What is a DOI?

ABSTRACT

The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the development of optimal level 3 BLAS code is costly and time consuming. However, it is possible to develop a portable and high-performance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the general matrix multiply and add operation. With suitable partitioning, all the other level 3 BLAS can be defined in terms of GEMM and a small amount of level 1 and level 2 computations. Our contribution is twofold. First, the model implementations in Fortran 77 of the GEMM-based level 3 BLAS are structured to reduced effectively data traffic in a memory hierarchy. Second, the GEMM-based level 3 BLAS performance evaluation benchmark is a tool for evaluating and comparing different implementations of the level 3 BLAS with the GEMM-based model implementations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
4
 
5
DACKLAND, K. 1995. Design issues and the performance of level 1 and level 2 kernels on Intel i860-based platforms. Report UMINF-95.xx, Department of Computing Science, Ume University, Ume , Sweden.
6
7
8
9
 
10
DONGARRA, J., MAYES, P., AND RADICATI DI BROZOLO, G. 1991. The IBM RISC System 6000 and linear algebra operations. Supercomput. 8, 4, 15-30.
 
11
 
12
GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2, 12-48.
 
13
GRASEMANN, H. 1989. Optimization of level 3 BLAS for SIEMENS VP systems. Tech. Rep. 38.89 (Sept.), University of Karlsruhe, Computer Center.
 
14
GREEN, M. 1994. High performance level 3 BLAS. A KSR implementation. Working Note (April), Department of Mathematics, University of Manchester, Manchester, UK.
15
 
16
IBM. 1994. Engineering and Scientific Subroutine Library, Guide and Reference.
 
17
INTEL. 1993. Paragon Basic Math Library performance report. Technical Report. 312936- 001 (Oct.), Intel Supercomputer Division. Beaverton, Ore.
 
18
K GSTR(~M, B. AND VAN LOAN, C. 1989. GEMM-based level 3 BLAS. Technical Report CTC91TR47 (Dec.), Department of Computer Science, Cornell University.
 
19
K GSTR(~M, B., LING, P., AND VAN LOAN, C. 1991. High performance GEMM-based level 3 BLAS: Sample routines for double precision real data. In High Performance Computing II (Amsterdam, 1991). North-Holland, 269-281.
 
20
K GSTR(~M, B., LING, P., AND VAN LOAN, C. 1993. Portable high performance GEMM-based level 3 BLAS. In Parallel Processing for Scientific Computing (Philadelphia, 1993). SIAM Publications, 339-346.
 
21
K GSTR(~M, B., LING, P., AND VAN LOAN, C. 1994. GEMM-based level 3 BLAS: Algorithms for the model implementations. Report UMINF-94.13 (December), Department of Computing Science, Ume University, Ume , Sweden. Revised, December 1995.
22
23
 
24
 
25
 
26
STRASSEN, V. 1969. Gaussian elimination is not optimal. Numer. Math. 13, 354-356.
 
27
WINOGRAD, S. 1973. Some remarks on fast multiplication of polynomials. In Complexity of Sequential and Parallel Numerical Algorithms (New York). Academic Press, 181.

CITED BY  16
 
 
 


REVIEW

"Timothy R. Hopkins : Reviewer"

The basic linear algebra subroutines (BLAS) consist of three libraries (known as Levels 1, 2, and 3) and form an integral part of much of the important numerical software developed over the last two decades. Efficient implementatio  more...

Collaborative Colleagues:
Bo Kågström: colleagues
Per Ling: colleagues
Charles van Loan: colleagues

Peer to Peer - Readers of this Article have also read: