| Efficient householder QR factorization for superscalar processors |
| Full text |
Pdf
(190 KB)
|
| Source
|
ACM Transactions on Mathematical Software (TOMS)
archive
Volume 23 , Issue 3 (September 1997)
table of contents
Pages: 362 - 378
Year of Publication: 1997
ISSN:0098-3500
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 48, Citation Count: 0
|
|
|
ABSTRACT
To extract the potential promised by superscalar processors, algorithm designers must streamline memory references and allow for efficient data reuse throughout the memory hierarchy. Two parameterized Householder QR factorization algorithms are presented that take into account the caches and registers typical of such processors. Guidelines are developed for choosing parameter values that obtain near-optimal cache and register utilization. The new algorithms are implemented and performance-tuned on an Intel Pentium Pro system, a single thin POWER2 node of the IBM Scalable Parallel system 2 (SP2), and a single R8000 processor of a Silicon Graphs POWER Challenge XL.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Anderson , Z. Bai , C. Bischof , L. S. Blackford , J. Demmel , Jack J. Dongarra , J. Du Croz , S. Hammarling , A. Greenbaum , A. McKenney , D. Sorensen, LAPACK Users' guide (third ed.), Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999
|
| |
2
|
|
| |
3
|
BJORCK, A. 1990. Least squares methods. In Handbook of NumericalAnalysis, P. Ciarlet and J. Lions, Eds. Elsevier North-Holland, Inc., New York, NY, 465-652.
|
| |
4
|
CARRIG, J. J. AND MEYER, G. G. 1996. Two tunable Householder QR decomposition algorithms. Tech. Rep. 96-14, Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD.
|
 |
5
|
|
| |
6
|
GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A.H. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. High Perform. Eng. 2, 1, 12-48.
|
| |
7
|
GOLUB, G. AND WILKINSON, J. 1966. Note on the iterative refinement of least squares solutions. Numer. Math. 9, 139-148.
|
| |
8
|
|
 |
9
|
|
| |
10
|
INTEL. 1995. Programmer's Reference Manual. Pentium Pro family developer's manual, vol. 2. Intel Corporation, Santa Clara, CA.
|
| |
11
|
LAWSON, C. L. AND HANSON, R.J. 1974. Solving Least Squares Problems. Prentice-Hall, Inc., Upper Saddle River, NJ.
|
| |
12
|
|
| |
13
|
SILICON GRAPHICS. 1994. POWER CHALLENGE technical report. Silicon Graphics, Incorporated, Mountain View, CA.
|
| |
14
|
WHITE, S. W. AND DHAWAN, S. 1995. POWER2: The next generation of the RISC System/ 6000 family. Draft prepared for PowerPC and POWER2: Technical Aspects of the New IBM RISC System/6000. IBM Corp., Riverton, NJ.
|
| |
15
|
|
|