ACM Home Page
Please provide us with feedback. Feedback
Communication avoiding Gaussian elimination
Full text PdfPdf (597 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2008 ACM/IEEE conference on Supercomputing - Volume 00 table of contents
Austin, Texas
SECTION: Papers table of contents
Article No. 29  
Year of Publication: 2008
ISBN:978-1-4244-2835-9
Authors
Laura Grigori  Universite Paris-Sud, Orsay France
James W. Demmel  UC Berkeley, CA
Hua Xiang  Universite Paris-Sud, Orsay France
Publisher
IEEE Press  Piscataway, NJ, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 126,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We present CALU, a Communication Avoiding algorithm for the LU factorization of dense matrices distributed in a two-dimensional cyclic layout. The algorithm is based on a new pivoting strategy, which is stable in practice. The new algorithm is optimal (up to polylogarithmic factors) in the amount of communication it performs.

Our experiments show that CALU leads to a reduction in the parallel time, in particular when the latency time is an important factor of the overall time. The factorization of a block-column, a subroutime of CALU, outperforms the corresponding routine PDGETF2 from ScaLAPACK up to a factor of 4.37 on an IBM POWER 5 system and up to a factor of 5.58 on a Cray XT4 system. On square matrices of order 104, CALU outperforms the corresponding routine PDGETRF from ScaLAPACK by a factor of 1.24 on IBM POWER 5 and by a factor of 1.31 on Cray XT4.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Top 500 supercomputer sites. available at www.top500.org.
 
2
M. Baboulin, J. Dongarra, and S. Tomov. Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures. Technical Report UT-CS-08-615, University of Tennessee, 2008. LAPACK Working Note 200.
 
3
A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A class of parallel tiled linear algebra algorithms for multicore architectures. Technical Report UT-CS-07-600, University of Tennessee, 2007. LAPACK Working Note 191.
 
4
 
5
J. Demmel, L. Grigori, M. Hoemmen, and J. Langou. Communication-optimal parallel and sequential QR and LU factorizations. Technical Report UCB/EECS-2008-89, UC Berkeley, 2008. LAPACK Working Note 204.
 
6
J. J. Dongarra, P. Luszczek, and A. Petitet. The LIN-PACK Benchmark: Past, Present and Future. Concurrency: Practice and Experience, 15:803--820, 2003.
 
7
 
8
S. L. Graham, M. Snir, and C. A. Patterson, editors. Getting Up To Speed: The Future Of Supercomputing. National Academies Press, Washington, D.C., USA, 2005.
 
9
 
10
G. Quintana-Orti, E. S. Quiutana-Orti, E. Chan, F. G. Van Zee, and R. van de Geijn. Programming algorithms-by-blocks for matrix computations on multithreaded architectures. Technical Report TR-08-04, University of Texas at Austin, 2008. FLAME Working Note 29.
 
11
D. Skinner. IBM SP Parallel Scaling Overview. http://www.nersc.gov/news/reports/technical/seaborg_scaling.
 
12
A. Tate. Personal communication.
 
13
 
14
 
15


Collaborative Colleagues:
Laura Grigori: colleagues
James W. Demmel: colleagues
Hua Xiang: colleagues