| Efficient householder QR factorization for superscalar processors |
| Full text |
Pdf
(190 KB)
|
| Source
|
ACM Transactions on Mathematical Software (TOMS)
archive
Volume 23 , Issue 3 (September 1997)
table of contents
Pages: 362 - 378
Year of Publication: 1997
ISSN:0098-3500
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 49, Citation Count: 0
|
|
|
ABSTRACT
To extract the potential promised by superscalar processors, algorithm designers must streamline memory references and allow for efficient data reuse throughout the memory hierarchy. Two parameterized Householder QR factorization algorithms are presented that take into account the caches and registers typical of such processors. Guidelines are developed for choosing parameter values that obtain near-optimal cache and register utilization. The new algorithms are implemented and performance-tuned on an Intel Pentium Pro system, a single thin POWER2 node of the IBM Scalable Parallel system 2 (SP2), and a single R8000 processor of a Silicon Graphs POWER Challenge XL.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Anderson , Z. Bai , C. Bischof , L. S. Blackford , J. Demmel , Jack J. Dongarra , J. Du Croz , S. Hammarling , A. Greenbaum , A. McKenney , D. Sorensen, LAPACK Users' guide (third ed.), Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999
|
| |
2
|
|
| |
3
|
BJORCK, A. 1990. Least squares methods. In Handbook of NumericalAnalysis, P. Ciarlet and J. Lions, Eds. Elsevier North-Holland, Inc., New York, NY, 465-652.
|
| |
4
|
CARRIG, J. J. AND MEYER, G. G. 1996. Two tunable Householder QR decomposition algorithms. Tech. Rep. 96-14, Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD.
|
 |
5
|
|
| |
6
|
GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A.H. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. High Perform. Eng. 2, 1, 12-48.
|
| |
7
|
GOLUB, G. AND WILKINSON, J. 1966. Note on the iterative refinement of least squares solutions. Numer. Math. 9, 139-148.
|
| |
8
|
|
 |
9
|
|
| |
10
|
INTEL. 1995. Programmer's Reference Manual. Pentium Pro family developer's manual, vol. 2. Intel Corporation, Santa Clara, CA.
|
| |
11
|
LAWSON, C. L. AND HANSON, R.J. 1974. Solving Least Squares Problems. Prentice-Hall, Inc., Upper Saddle River, NJ.
|
| |
12
|
|
| |
13
|
SILICON GRAPHICS. 1994. POWER CHALLENGE technical report. Silicon Graphics, Incorporated, Mountain View, CA.
|
| |
14
|
WHITE, S. W. AND DHAWAN, S. 1995. POWER2: The next generation of the RISC System/ 6000 family. Draft prepared for PowerPC and POWER2: Technical Aspects of the New IBM RISC System/6000. IBM Corp., Riverton, NJ.
|
| |
15
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|