|
ABSTRACT
The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matrix multiplications and the solution of triangular systems with multiple right-hand sides. They are intended to provide efficient and portable building blocks for linear algebra algorithms on high-performance computers. We describe algorithms for the BLAS3 operations that are asymptotically faster than the conventional ones. These algorithms are based on Strassen's method for fast matrix multiplication, which is now recognized to be a practically useful technique once matrix dimensions exceed about 100. We pay particular attention to the numerical stability of these “fast BLAS3.” Error bounds are given and their significance is explained and illustrated with the aid of numerical experiments. Our conclusion is that the fast BLAS3, although not as strongly stable as conventional implementations, are stable enough to merit careful consideration in many applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
BINI, D., AND LOTTI, D. Stability of fast algorithms for matrix multiplication. Numer. Math. 36 (1980), 63-72.
|
| |
4
|
|
| |
5
|
|
| |
6
|
BRENT, R.P. Error analysis of algorithms for matrix multiplication and triangular decomposition using Winograd's identity. Numer. Math. 16 (1970), 145-156.
|
 |
7
|
|
| |
8
|
DAYD~, M0' J', AND DUFF, I.S. Use of level 3 BLAS in LU factorization on the Cray-2, the ETA-10P, and the IBM 3090-200/VF. Tech. Rep. CSS 229, Computer Science and Systems Div., Harwell Lab., 1988.
|
| |
9
|
DEMMEL, J. W., DONGARRA, J. J., Du CROZ, J. J., GREENBAUM, A., HAMMARLING, S. J., AND SORENSEN, D. C. Prospectus for the development of a linear algebra library for highperformance computers. Tech. Memor. 97. Mathematics and Computer Science Div., Argonne National Lab., Argonne, Ill., 1987.
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A. I-I. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2 (1988), 12-48.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
IBM. Engineering and Scientific Subroutine Library, Guide and Reference, Release 3. 4th ed., Program 5668-863, 1988.
|
| |
19
|
MILLER, W. Computational complexity and numerical stability. SIAM J. Comput. 4 (1975), 97-107.
|
| |
20
|
MOLER, C. B., LITTLE, J. N., AND BANGERT, S. Pro-Matlab User's Guide. The MathWorks, Inc., South Natick, Mass., 1987.
|
| |
21
|
|
| |
22
|
SCHREIBER, R.S. Block algorithms for parallel machines. In Numerical Algorithms for Modern Parallel Computer Architectures, M. H. Schultz, Ed., IMA Volumes In Mathematics and Its Applications 13, Springer-Verlag, Berlin, 1988, 197-207.
|
| |
23
|
|
| |
24
|
STRASSEN, V. Gaussian elimination is not optimal. Numer. Math. I3 (1969), 354-356.
|
CITED BY 14
|
|
|
|
|
|
|
|
|
|
|
Steven Huss-Lederman , Elaine M. Jacobson , Anna Tsao , Thomas Turnbull , Jeremy R. Johnson, Implementation of Strassen's algorithm for matrix multiplication, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.32-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
B. Ujfalussy , Xindong Wang , Xiaoguang Zhang , D. M. C. Nicholson , W. A. Shelton , G. M. Stocks , A. Canning , Yang Wang , B. L. Gyorffy, High performance first principles method for complex magnetic properties, Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), p.1-15, November 07-13, 1998, San Jose, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
REVIEW
"Peter Bruce Worland : Reviewer"
The BLAS3 (Level 3 Basic Linear Algebra Subprograms) is a set of
specifications, based on matrix-matrix operations, for efficient and
portable routines to be used on high-performance computers. This
interesting and clearly written paper explor
more...
|