|
ABSTRACT
By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the Cell BE processor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Amestoy, P. R., Duff, I. S., and L'Excellent, J.-Y. 2000. Multifrontal parallel distributed symmetric and unsymmetric solvers. Comput. Meth. Appl. Mech. Eng. 184, 501--520.
|
| |
2
|
|
| |
3
|
|
| |
4
|
E. Anderson , Z. Bai , C. Bischof , L. S. Blackford , J. Demmel , Jack J. Dongarra , J. Du Croz , S. Hammarling , A. Greenbaum , A. McKenney , D. Sorensen, LAPACK Users' guide (third ed.), Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999
|
| |
5
|
Ashcraft, C., Grimes, R., Lewis, J., Peyton, B. W., and Simon, H. 1987. Progress in sparse matrix methods in large sparse linear systems on vector supercomputers. Intern. J. of Supercomput. Appl. 1, 10--30.
|
| |
6
|
|
| |
7
|
Balay, S., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., and Zhang, H. 2001. PETSc Web page. http://www.mcs.anl.gov/petsc.
|
| |
8
|
Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J. M., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and der Vorst, H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics, Philadelphia, PA. http://www.netlib.org/templates/Templates.html.
|
| |
9
|
Björck, A. 1990. Iterative refinement and reliable computing. In Reliable Numerical Computation, M. G. Cox and S. Hammarling, Eds. Oxford University Press, Oxford, UK, 249--266.
|
| |
10
|
Buttari, A., Dongarra, J., Kurzak, J., Luszczek, P., and Tomov, S. 2006. Computations to enhance the performance while achieving the 64-bit accuracy. Tech. rep. UT-CS-06-584, University of Tennessee Knoxville. LAPACK Working Note 180.
|
 |
11
|
|
 |
12
|
|
| |
13
|
Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, R. C., and Yelick, K. 2005. Self-adapting linear algebra algorithms and software. Proc. IEEE 93, 2. http://www.spiral.net/ieee-special-issue/overview.html.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Dongarra, J. J. and Eijkhout, V. 2002. Self-adapting numerical software for next generation applications. Tech. rep. ICL-UT-02-07, Innovative Computing Lab, University of Tennessee, Lapack Working Note 157. http://icl.cs.utk.edu/iclprojects/pages/sans.html.
|
 |
18
|
|
| |
19
|
|
| |
20
|
Embree, M. 2003. The tortoise and the hare restart gmres. SIAM Rev. 45, 259--266.
|
| |
21
|
Forsythe, G. E. and Moler, C. B. 1967. Computer Solution of Linear Algebraic Systems. Prentice-Hall, Englewood Cliffs, NJ.
|
| |
22
|
Göddeke, D., Strzodka, R., and Turek, S. 2005. Accelerating double precision FEM simulations with GPUs. In Simulationstechnique 18th Symposium in Erlangen. F. Hülsemann, M. Kowarschik, and U. Rüde, Eds. Vol. Frontiers in Simulation. SCS Publishing House e.V., 139--144.
|
| |
23
|
|
| |
24
|
|
| |
25
|
Gropp, W. D., Kaushik, D. K., Keyes, D. E., and Smith, B. F. 2000. Latency, bandwidth, and concurrent issue limitations in high-performance CFD. Tech. rep. ANL/MCS-P850-1000, Argonne National Laboratory.
|
| |
26
|
|
| |
27
|
Gurtin, M. E. 1981. An Introduction to Continuum Mechanics. Academic Press, New York, NY.
|
| |
28
|
Hackbusch, W. 1985. Multigrid Methods and Applications. Springer Series in Computational Mathematics, Vol. 4, Springer-Verlag, Berlin, Germany.
|
| |
29
|
|
 |
30
|
Julie Langou , Julien Langou , Piotr Luszczek , Jakub Kurzak , Alfredo Buttari , Jack Dongarra, Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems), Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188573]
|
| |
31
|
Li, X. S. 1996. SuperLU software, Ph.D. thesis, Computer Science Department, University of California at Berkeley. http://www.nersc.gov/ xiaoye/SuperLU/.
|
 |
32
|
|
 |
33
|
|
| |
34
|
|
| |
35
|
Quarteroni, A. and Valli, A. 1999. Domain Decomposition Methods for Partial Differential Equations. Oxford University Press, Cambridge, UK.
|
| |
36
|
Saad, Y. 1991. A flexible inner-outer preconditioned GMRES algorithm. Tech. rep. 91-279, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN.
|
| |
37
|
|
| |
38
|
|
| |
39
|
Saad, Y. and Wu, K. 1996. DQGMRES: a direct quasi-minimal residual algorithm based on incomplete orthogonalization. Num. Linear Algeb. Appl. 3, 4, 329--343.
|
| |
40
|
Simoncini, V. and Szyld, D. 2002a. Theory of inexact Krylov subspace methods and applications to scientific computing. Tech. rep. 02-4-12, Department of Mathematics, Temple University.
|
| |
41
|
|
| |
42
|
|
| |
43
|
|
| |
44
|
Strzodka, R. and Göddeke, D. 2006a. Mixed precision methods for convergent iterative schemes. EDGE'06, 23.-24. Chapel Hill, NC.
|
| |
45
|
|
| |
46
|
|
| |
47
|
van den Eshof, J., Sleijpen, G. L. G., and van Gijzen, M. B. 2003. Relaxation strategies for nested Krylov methods. Technical report TR/PA/03/27, CERFACS, Toulouse, France.
|
| |
48
|
van der Vorst, H. A. and Vuik, C. 1994. GMRESR: a family of nested GMRES methods. Num. Linear Algeb. Appl. 1, 4, 369--386.
|
| |
49
|
|
| |
50
|
|
|