|
ABSTRACT
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the development of optimal level 3 BLAS code is costly and time consuming. However, it is possible to develop a portable and high-performance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the general matrix multiply and add operation. With suitable partitioning, all the other level 3 BLAS can be defined in terms of GEMM and a small amount of level 1 and level 2 computations. Our contribution is twofold. First, the model implementations in Fortran 77 of the GEMM-based level 3 BLAS are structured to reduced effectively data traffic in a memory hierarchy. Second, the GEMM-based level 3 BLAS performance evaluation benchmark is a tool for evaluating and comparing different implementations of the level 3 BLAS with the GEMM-based model implementations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
E. Anderson , Z. Bai , C. Bischof , J. Demmel , J. Dongarra , J. Du Croz , A. Greenbaum , S. Hammarling , A. McKenney , S. Ostrouchov , D. Sorensen, LAPACK's user's guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992
|
 |
4
|
|
| |
5
|
DACKLAND, K. 1995. Design issues and the performance of level 1 and level 2 kernels on Intel i860-based platforms. Report UMINF-95.xx, Department of Computing Science, Ume University, Ume , Sweden.
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
DONGARRA, J., MAYES, P., AND RADICATI DI BROZOLO, G. 1991. The IBM RISC System 6000 and linear algebra operations. Supercomput. 8, 4, 15-30.
|
| |
11
|
|
| |
12
|
GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2, 12-48.
|
| |
13
|
GRASEMANN, H. 1989. Optimization of level 3 BLAS for SIEMENS VP systems. Tech. Rep. 38.89 (Sept.), University of Karlsruhe, Computer Center.
|
| |
14
|
GREEN, M. 1994. High performance level 3 BLAS. A KSR implementation. Working Note (April), Department of Mathematics, University of Manchester, Manchester, UK.
|
 |
15
|
|
| |
16
|
IBM. 1994. Engineering and Scientific Subroutine Library, Guide and Reference.
|
| |
17
|
INTEL. 1993. Paragon Basic Math Library performance report. Technical Report. 312936- 001 (Oct.), Intel Supercomputer Division. Beaverton, Ore.
|
| |
18
|
K GSTR(~M, B. AND VAN LOAN, C. 1989. GEMM-based level 3 BLAS. Technical Report CTC91TR47 (Dec.), Department of Computer Science, Cornell University.
|
| |
19
|
K GSTR(~M, B., LING, P., AND VAN LOAN, C. 1991. High performance GEMM-based level 3 BLAS: Sample routines for double precision real data. In High Performance Computing II (Amsterdam, 1991). North-Holland, 269-281.
|
| |
20
|
K GSTR(~M, B., LING, P., AND VAN LOAN, C. 1993. Portable high performance GEMM-based level 3 BLAS. In Parallel Processing for Scientific Computing (Philadelphia, 1993). SIAM Publications, 339-346.
|
| |
21
|
K GSTR(~M, B., LING, P., AND VAN LOAN, C. 1994. GEMM-based level 3 BLAS: Algorithms for the model implementations. Report UMINF-94.13 (December), Department of Computing Science, Ume University, Ume , Sweden. Revised, December 1995.
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
STRASSEN, V. 1969. Gaussian elimination is not optimal. Numer. Math. 13, 354-356.
|
| |
27
|
WINOGRAD, S. 1973. Some remarks on fast multiplication of polynomials. In Complexity of Sequential and Parallel Numerical Algorithms (New York). Academic Press, 181.
|
REVIEW
"Timothy R. Hopkins : Reviewer"
The basic linear algebra subroutines (BLAS) consist of
three libraries (known as Levels 1, 2, and 3) and form an integral part
of much of the important numerical software developed over the last two
decades. Efficient implementatio
more...
|