|
ABSTRACT
This article discusses the high-performance parallel implementation of the computation and updating of QR factorizations of dense matrices, including problems large enough to require out-of-core computation, where the matrix is stored on disk. The algorithms presented here are scalable both in problem size and as the number of processors increases. Implementation using the Parallel Linear Algebra Package (PLAPACK) and the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK) is discussed. The methods are shown to attain excellent performance, in some cases attaining roughly 80&percent; of the “realizable” peak of the architectures on which the experiments were performed.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Philip Alpatov , Greg Baker , Carter Edwards , John Gunnels , Greg Morrow , James Overfelt , Robert van de Geijn , Yuan-Jye J. Wu, PLAPACK: parallel linear algebra package design overview, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-16, November 15-21, 1997, San Jose, CA
[doi> 10.1145/509593.509622]
|
| |
2
|
E. Anderson , Z. Bai , C. Bischof , J. Demmel , J. Dongarra , J. Du Croz , A. Greenbaum , S. Hammarling , A. McKenney , S. Ostrouchov , D. Sorensen, LAPACK's user's guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992
|
| |
3
|
|
| |
4
|
Bjorck, A. 1996. Numerical Methods for Least Squares Problems. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
|
| |
5
|
Choi, J., Dongarra, J. J., Pozo, R., and Walker, D. W. 1992. ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers. In Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation. IEEE Computer Society Press, 120--127.
|
| |
6
|
Coleman, R., Leback, B., Norin, R., Scott, D., and de Houten, K. V. 1992. Soz - a dense, out-of-core solver with partial pivoting for the iPSC/860: A case history. In 1992 Annual Users Conference.
|
| |
7
|
Condi, F., Gunter, B., Ries, J., and Tapley, B. 2003. Combining sea surface and terrestrial gravity data for global geopotential modelling and geoid determination. In Eos Trans. AGU, 84(46), Fall Meet. Suppl., Abstract G31A-06.
|
| |
8
|
|
| |
9
|
Dongarra, J., Kaufmann, L., and Hammarling, S. 1986. Squeezing the most out of eigenvalue solvers on high-performance computers. Linear Algebra and It Applications 77:113--136.
|
| |
10
|
|
| |
11
|
Dongarra, J. J., Bunch, J. R., Moler, C. B., and Stewart, G. W. 1979. LINPACK Users' Guide. SIAM, Philadelphia.
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Elmroth, E. and Gustavson, F. G. 2000. Applying recursion to serial and parallel QR factorization leads to better performance. IBM J. Res. Dev. 44, 4 (July), 605--624.
|
| |
17
|
Elmroth, E. and Gustavson, F. G. 2001. A faster and simpler recursive algorithm for the LAPACK routine DGELS. BIT 41, 5, 936--949.
|
| |
18
|
|
| |
19
|
Gropp, W., Lusk, E., and Skjellum, A. 1994. Using MPI. The MIT Press.
|
 |
20
|
|
| |
21
|
Gunter, B. C. 2000. Parallel least squares analysis of simulated GRACE data. CSR Technical Memoranda CSR-TM-00-05, The Center for Space Research, The University of Texas at Austin.
|
| |
22
|
|
| |
23
|
Gunter, B. C., Tapley, B. D., and van de Geijn, R. A. 2001b. Advanced parallel least squares algorithms for GRACE data processing. In Proceedings of the International Association of Geodesy (IAG) Conference. Budapest, Hungary.
|
| |
24
|
|
| |
25
|
Klimkowski, K. and van de Geijn, R. 1995. Anatomy of an out-of-core dense linear solver. In Proceedings of the International Conference on Parallel Processing 1995. Vol. III---Algorithms and Applications. 29--33.
|
 |
26
|
|
| |
27
|
Lichtenstein, W. and Johnsson, S. L. 1992. Block-cyclic dense linear algebra. Tech. Rep. TR-04-92, Harvard University, Center for Research in Computing Technology. Jan.
|
 |
28
|
|
| |
29
|
Rabani, E. and Toledo, S. 2001. Out-of-core SVD and QR decompositions. In Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing (PARA). Norfolk, Virginia.
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
Scott, D. S. 1993. Parallel I/O and solving out-of-core systems of linear equations. In Proceedings of the 1993 DAGS/PC Symposium. Dartmouth Institute for Advanced Graduate Studies, Hanover, NH, 123--130.
|
| |
34
|
|
| |
35
|
Stewart, G. 1990. Communication and matrix computations on large message passing systems. Parallel Computing 16, 27--40.
|
| |
36
|
Strazdins, P. 1998. Optimal load balancing techniques for block-cyclic decompositions for matrix factorization. Tech. Rep. TR-CS-98-10, Canberra 0200 ACT, Australia.
|
 |
37
|
Sivan Toledo , Fred G. Gustavson, The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations, Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference, p.28-40, May 27-27, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/236017.236029]
|
| |
38
|
|
| |
39
|
Robert A. van de Geijn , Philip Alpatou , Greg Baker , Carter Edwards , John Gunnels , Greg Morrow , James Overfelt, Using PLAPACK: parallel linear algebra package, MIT Press, Cambridge, MA, 1997
|
| |
40
|
|
|