|
ABSTRACT
This article describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions allows us to implement some algorithms that are simpler, more accurate, and sometimes faster than possible without these features. The new BLAS are challenging to implement and test because there are many more subroutines than in the existing Standard, and because we must be able to assess whether a higher precision is used for internal computations than is used for either input or output variables. We have therefore developed an automated process of generating and systematically testing these routines. Our methodology is applicable to languages besides C. In particular, our algorithms used in the testing code will be valuable to all other BLAS implementors. Our extra precision routines achieve excellent performance---close to half of the machine peak Megaflop rate even for the Level 2 BLAS, when the data access is stride one.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Basic Linear Algebra Subprograms (BLAS) Standard. 2000. http://www.netlib.org/utk/papers/blast-forum.html.
|
| |
2
|
IEEE. 2001. 754 Revision. http://grouper.ieee.org/groups/754/.
|
| |
3
|
E. Anderson , Z. Bai , C. Bischof , L. S. Blackford , J. Demmel , Jack J. Dongarra , J. Du Croz , S. Hammarling , A. Greenbaum , A. McKenney , D. Sorensen, LAPACK Users' guide (third ed.), Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999
|
| |
4
|
ANSI/IEEE Std. 754-1985. IEEE Standard for Binary Floating Point Arithmetic.
|
| |
5
|
ANSI/IEEE Std. 854-1987. IEEE Standard for Radix Independent Floating Point Arithmetic.
|
| |
6
|
ISO/IEC 9899:1999. Standard for the C programming language (C99). Jan 99 draft available at http://anubis.dkuug.dk/JTC1/SC22/WG14/www/docs/n869. Final version to be available from http://www.iso.ch.
|
| |
7
|
Bailey, D. 2000. A Fortran-90 double-double precision library. http://www.nersc.gov/∼dhbailey/mpdist/mpdist.html.
|
 |
8
|
|
| |
9
|
Bilmes, J., Asanović, K., Demmel, J., Lam, D., and Chin, C. 1996. The PHiPAC WWW home page. http://www.icsi.berkeley.edu/∼bilmes/phipac.
|
| |
10
|
Jack J. Dongarra , L. S. Blackford , J. Choi , A. Cleary , E. D'Azeuedo , J. Demmel , I. Dhillon , S. Hammarling , G. Henry , A. Petitet , K. Stanley , D. Walker , R. C. Whaley, ScaLAPACK user's guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1997
|
| |
11
|
Bleher, J., Roeder, A., and Rump, S. 1985. ACRITH: High-Accuracy Arithmetic---An advanced tool for numerical computation. In Proceedings of the 7th Symposium on Computer Arithmetic. IEEE Computer Society Press, Urbana, Il.
|
 |
12
|
|
| |
13
|
Briggs, K. 1998. Doubledouble floating point arithmetic. http://www-epidem.plantsci.cam.ac.uk/∼kbriggs/doubledouble.html.
|
| |
14
|
Dekker, T. 1971. A floating-point technique for extending the available precision. Numer. Math. 18, 224--242.
|
| |
15
|
Demmel, J. 1984. Underflow and the reliability of numerical software. SIAM J. Sci. Stat. Comput. 5, 4 (Dec.), 887--919.
|
| |
16
|
|
| |
17
|
|
| |
18
|
Dhillon, I., Fann, G., and Parlett, B. 1997. Application of a new algorithm for the symmetric eigenproblem to computational quantum chemistry. In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, Pa.
|
| |
19
|
|
| |
20
|
Dongarra, J., Bunch, J., Moler, C., and Stewart, G. W. 1979. LINPACK User's Guide. SIAM, Philadelphia, Pa.
|
 |
21
|
|
 |
22
|
|
| |
23
|
Eisenstat, S. C., Elman, H. C., and Schultz, M. H. 1983. Variational iterative methods for nonsymmetric systems of linear equations. SIAM J. Numer. Anal. 20, 345--357.
|
| |
24
|
Forsythe, G. E. and Moler, C. B. 1967. Computer Solution of Linear Algebraic Systems. Prentice-Hall, Englewood Cliffs, N.J.
|
| |
25
|
Gupta, A., Joshi, M., Karypis, G., and Kumar, V. 1999. PSPASES: A scalable parallel direct solver for sparse symmetric positive definite systems. http://www.cs.umn.edu/mjoshi/pspases.
|
| |
26
|
Henry, G. 1998. UNIX extended precision library for the pentium. http://www.cs.utk.edu/∼ghenry/distrib/archive.htm.
|
| |
27
|
|
| |
28
|
|
| |
29
|
Kahan, W. 1998. Matlab's loss is nobody's gain. http://www.cs.berkeley.edu/∼wkahan/MxMulEps.pdf.
|
| |
30
|
Kahan, W. and Darcy, J. D. 1998. How Java's floating-point hurts everyone everywhere. http://www.cs.berkeley.edu/∼wkahan/JAVAhurt.pdf.
|
| |
31
|
Kahan, W. and LeBlanc, E. 1985. Anomalies in the IBM ACRITH package. In Proceedings of the 7th Symposium on Computer Arithmetic. IEEE Computer Society Press, Urbana, Ill.
|
| |
32
|
|
| |
33
|
Kulisch, U. and Miranker, W., Eds. 1983. A New Approach to Scientific Computing. Academic Press, New York.
|
 |
34
|
|
| |
35
|
|
| |
36
|
Li, X. S. and Demmel, J. W. 2001. SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems. Tech. rep., Lawrence Berkeley National Laboratory. December. in preparation.
|
| |
37
|
Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kapur, A., Martin, M. C., Tung, T., and Yoo, D. J. 2000. Design, implementation and testing of extended and mixed precision BLAS. Tech. Rep. LBNL-45991, Lawrence Berkeley National Laboratory. June. http://www.nersc.gov/∼xiaoye/XBLAS/.
|
| |
38
|
M4 macro processor. http://www.cs.utah.edu/csinfo/texinfo/m4/m4.html.
|
| |
39
|
Macsyma, Inc. 1996. Macsyma Mathematics and System Reference Manual, 16th ed, 589 pages.
|
| |
40
|
Møller, O. 1965. Quasi double precision in floating-point arithmetic. BIT 5, 37--50.
|
| |
41
|
M. B. Monagan , K. O. Geddes , K. M. Heal , G. Labahn , S. Vorkoetter, Maple V: programming guide, Springer-Verlag New York, Inc., Secaucus, NJ, 1996
|
| |
42
|
Oettli, W. and Prager, W. 1964. Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides. Numer. Math. 6, 405--409.
|
| |
43
|
Parlett, B. and Dhillon, I. 1997. Fernando's solution to Wilkinson's problem: An application of double factorization. Lin. Alg. Appl. 267, 247--279.
|
| |
44
|
Parlett, B. and Marques, O. 2000. An implementation of the DQDS algorithm (Positive case). Lin. Alg. Appl. 309, 217--259.
|
| |
45
|
Pichat, M. 1972. Correction d'une somme en arithmetique à virgule flottante. Numer. Math. 19, 400--406.
|
| |
46
|
Priest, D. 1991. Algorithms for arbitrary precision floating point arithmetic. In Proceedings of the 10th Symposium on Computer Arithmetic, P. Kornerup and D. Matula, Eds. IEEE Computer Society Press, Grenoble, France, 132--145.
|
| |
47
|
|
| |
48
|
Shewchuk, J. R. 1997. Adaptive precision floating-point arithmetic and fast robust geometric predicates. Disc. Comput. Geom. 18, 305--363.
|
| |
49
|
Smith, B. T., Boyle, J. M., Dongarra, J. J., Garbow, B. S., Ikebe, Y., Klema, V. C., and Moler, C. B. 1976. Matrix Eigensystem Routines---EISPACK Guide. In Lecture Notes in Computer Science, vol. 6. Springer-Verlag, Berlin, Germany.
|
| |
50
|
|
| |
51
|
Whaley, R. C. and Dongarra, J. 1998. The ATLAS home page. http://www.netlib.org/atlas/.
|
| |
52
|
|
CITED BY 19
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Douglas Gregor , Jaakko Järvi , Mayuresh Kulkarni , Andrew Lumsdaine , David Musser , Sibylle Schupp, Generic programming and high-performance libraries, International Journal of Parallel Programming, v.33 n.2, p.145-164, June 2005
|
|
|
|
|
|
Victor Y. Pan , Dmitriy Ivolgin , Brian Murphy , Rhys Eric Rosholt , Islam Taj-Eddin , Yuqing Tang , Xiaodong Yan, Additive preconditioning and aggregation in matrix computations, Computers & Mathematics with Applications, v.55 n.8, p.1870-1886, April, 2008
|
|
|
|
|
|
James Demmel , Yozo Hida , William Kahan , Xiaoye S. Li , Sonil Mukherjee , E. Jason Riedy, Error bounds from extra-precise iterative refinement, ACM Transactions on Mathematical Software (TOMS), v.32 n.2, p.325-351, June 2006
|
|
|
|
|
|
|
|
|
V. Y. Pan , B. Murphy , R. E. Rosholt , M. Tabanjeh, The schur aggregation for solving linear systems of equations, Proceedings of the 2007 international workshop on Symbolic-numeric computation, July 25-27, 2007, London, Ontario, Canada
|
|
|
|
|
|
Dominik Goddeke , Robert Strzodka , Jamaludin Mohd-Yusof , Patrick McCormick , Hilmar Wobker , Christian Becker , Stefan Turek, Using GPUs to improve multigrid solver performance on a cluster, International Journal of Computational Science and Engineering, v.4 n.1, p.36-55, November 2008
|
|
|
V. Y. Pan , D. Grady , B. Murphy , G. Qian , R. E. Rosholt , A. D. Ruslanov, Schur aggregation for linear systems and determinants, Theoretical Computer Science, v.409 n.2, p.255-268, December, 2008
|
|
|
|
|
|
|
REVIEW
"Timothy R. Hopkins : Reviewer"
This paper describes a proposed extension to the current basic linear algebra subprograms (BLAS) that will provide efficiency gains from using mixed precision arguments (for example, multiplication of a REAL matrix and a COMPLEX matrix), and allow
more...
|