ACM Home Page
Please provide us with feedback. Feedback
Cache efficient bidiagonalization using BLAS 2.5 operators
Full text PdfPdf (228 KB)
Source
ACM Transactions on Mathematical Software (TOMS) archive
Volume 34 ,  Issue 3  (May 2008) table of contents
Article No. 14  
Year of Publication: 2008
ISSN:0098-3500
Authors
Gary W. Howell  North Carolina State University, Raleigh, NC
James W. Demmel  University of California, Berkeley, CA
Charles T. Fulton  Florida Institute of Technology, Melbourne, FL
Sven Hammarling  University of Manchester, UK
Karen Marmol  Harris Corporation, Melbourne, FL
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 70,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1356052.1356055
What is a DOI?

ABSTRACT

On cache based computer architectures using current standard algorithms, Householder bidiagonalization requires a significant portion of the execution time for computing matrix singular values and vectors. In this paper we reorganize the sequence of operations for Householder bidiagonalization of a general m × n matrix, so that two (_GEMV) vector-matrix multiplications can be done with one pass of the unreduced trailing part of the matrix through cache. Two new BLAS operations approximately cut in half the transfer of data from main memory to cache, reducing execution times by up to 25 per cent. We give detailed algorithm descriptions and compare timings with the current LAPACK bidiagonalization algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Barlow, J. L., Bosner, N., and Drmač, Z. 2000. A new stable bidiagonal reduction algorithm. Lin. Alg. Appl. 397, 35--84.
 
3
Berry, M. 1992. Large scale singular value computations. Internat. J. Supercomput. Appl. 6, 13--49.
 
4
 
5
 
6
 
7
Blackford, S. and Dongarra, J. 1999. Installation guide for LAPACK, LAPACK Working Note 41.
 
8
Blackford, L. S., Corliss, G., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Hu, C., Kahan, W., Kaufmann, L., Kearfott, B., Frogh, F., Li, X., Maany, Z., Petitet, A., Pozo, R., Remington, K., Walster, W., Whaley, C., Wolff, V., Gudenberg, J., and Lumsdaine, A. 2002. Basic linear algebra subprograms technical (BLAST) forum standard, Int. J. High Perform. Comput. 12, 1--2 (www.netlib.org/blas/blast-forum).
9
 
10
Bosner, N. and Barlow, J. L. 2005. Block and parallel versions of one-sided bidiagonalization. preprint http://www.cse.psu.edu/barlow/block_bidiag.pdf.
 
11
Choi, J., Dongarra, J., and Walker, D. 1995. The design of a parallel dense linear algebra software library: reduction to Hessenberg, tridiagonal, and bidiagonal form. (LAPACK Working Note # 92) Num. Alg., 10, 379--399.
 
12
 
13
Dongarra, J., Hammarling, S., and Sorensen, D. 1989. Block reduction of matrices to condensed forms for eigenvalue computations. J. Comput. Appl. Math. 27, 215--227.
 
14
 
15
Douglas, C. C., Haase, G., Hu, J., Kowarschik, M., Rüde, U., and Weiss, C. 2000. Portable memory heirarchy techniques for PDE solvers: Part I. SIAM News, 33, 5.
 
16
Fernando, V., Parlett, B., and Dhillon, I. 1995. A way to find the most redundant equation in a tridiagonal system. Mathematics Department, University of California, Berkeley.
 
17
 
18
Golub, G. and Kahan, W. 1965. Calculating the singular Values and pseudo-inverse of a matrix, SIAM J. Num. Anal., 2, 205--224.
 
19
Golub, G. and Reinsch, C. 1970. Singular value decomposition and least squares solution. Numer. Math. 14, 403--420.
 
20
 
21
Grösser, B. and Lang, B. 1988. Efficient parallel reduction to bidiagonal form, Preprint BUGHW-SC 98/2. http://www.math.uni-wuppertal/org/SciComp/Preprint/SC9802ips.gz.
 
22
Howell, G. W. 2001. Sparse Householder bidiagonalization. CERFACS Sparse Days. http://ncsu. edu/itd/hpc/Documents/Publications/gary_howell/cerfacs01.ps
 
23
 
24
Owens, B. 2003. A Matlab script for 2-blocking to speed Ralha-Barlow one-sided bidiagonalization. Summer intern project at ERDC, MSRC, Vicksburg, MS. http://ncsu.edu/itd/hpc/Documents/Publications/gary_howell/barlow3.m.
25
 
26
Parlett, B. and Dhillon, I. 1997. Fernando's solution to Wilkinson's problem: An application of double factorization. Lin. Alg. Appl. 267, 247--279.
 
27
Ralha, R. M. S. 2003. One-sided reduction to bidiagonal form. Lin. Alg. Appl. 358, 219--238.
 
28
 
29
 
30

Collaborative Colleagues:
Gary W. Howell: colleagues
James W. Demmel: colleagues
Charles T. Fulton: colleagues
Sven Hammarling: colleagues
Karen Marmol: colleagues