ACM Home Page
Please provide us with feedback. Feedback
Vector and parallel algorithms for Cholesky factorization on IBM 3090
Full text PdfPdf (889 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 1989 ACM/IEEE conference on Supercomputing table of contents
Reno, Nevada, United States
Pages: 225 - 233  
Year of Publication: 1989
ISBN:0-89791-341-8
Authors
R. C. Agarwal  I.B.M. Research Division, Thomas J. Watson Research Center, Yorktown Hts., New York
F. G. Gustavson  I.B.M. Research Division, Thomas J. Watson Research Center, Yorktown Hts., New York
Sponsors
Argonne Natl Lab : Argonne National Lab
IEEE-CS : Computer Society
NASA : National Aeronatics and Space Administration
SIGARCH: ACM Special Interest Group on Computer Architecture
Los Alamos National Labs : Los Alamos National Labs
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 17,   Citation Count: 0
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/76263.76287
What is a DOI?

ABSTRACT

In many engineering applications, a solution of Fx = b is required, where F is a positive definite symmetric matrix. This is usually done by the Cholesky factorization, F = RRT, where R is the lower triangular Cholesky factor. This is a compute intensive problem. However, in order to achieve the best possible performance on IBM 3090 Vector Facility, the problem requires blocking at various levels to match 3090 memory hierarchy. A large problem which does not fit in a particular level of memory is blocked so that each block fits in memory. This minimizes data transfers between various levels of memory. In this paper, various blocking schemes are described for vector and parallel implementation on 3090 VF. Some of these algorithms have been included in the Engineering and Scientific Subroutine Library (ESSL). Performance numbers are also included. These algorithms achieve close to the peak performance of the 3090 uniprocessor and multiprocessors.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J.J. Dongarra, J. Bunch, C. Moler, and (3. Stewart, I,INPACK User's Guide, SIAM Pub., 1979.
 
2
J. Oemme|, J.J. l)ongarra, J. Du Croz, A. Greenbaum, S. tlammarling, and I). Sorenson, "Prospectus for the development of a linear algebra library for high-performance computers", Argonne National Laboratory, Mathematics and Computer Science Division, Technical Memorandum No. 97, Sept. 1987.
 
3
C. Bischof, J. Demmel, J. Dongarra, J. DtJ Croz, A. Greenbaum, S. Hammarling, and D. Sorensen, "I~AI)ACK working note #5, Provisional contents", Argonne National Labor~tory, Mathematics and Computer Science Division, ANI_,-88-38, Sept. 1988.
 
4
Preliminary meeting on BLAS 3 adoption, Argonne National l.aborztory, Jan. 27-29, 1987.
 
5
ESSI_, Guide and Reference, order number SC23-0184-0, IBM Corp., Feb., 1986.
 
6
S. Katoh, IBM Corp., private communication, 1989.
 
7
VS FORTFR, AN, Version 2, I,anguage and l.,ibrary Reference, order number SC26-4221-3, IBM Corp., March, 1988.

CITED BY  6

Collaborative Colleagues:
R. C. Agarwal: colleagues
F. G. Gustavson: colleagues