ACM Home Page
Please provide us with feedback. Feedback
Efficient householder QR factorization for superscalar processors
Full text PdfPdf (190 KB)
Source ACM Transactions on Mathematical Software (TOMS) archive
Volume 23 ,  Issue 3  (September 1997) table of contents
Pages: 362 - 378  
Year of Publication: 1997
ISSN:0098-3500
Authors
James J. Carrig, Jr.  Johns Hopkins Univ., Baltimore, MD
Gerard G. L. Meyer  Johns Hopkins Univ., Baltimore, MD
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 48,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/275323.275326
What is a DOI?

ABSTRACT

To extract the potential promised by superscalar processors, algorithm designers must streamline memory references and allow for efficient data reuse throughout the memory hierarchy. Two parameterized Householder QR factorization algorithms are presented that take into account the caches and registers typical of such processors. Guidelines are developed for choosing parameter values that obtain near-optimal cache and register utilization. The new algorithms are implemented and performance-tuned on an Intel Pentium Pro system, a single thin POWER2 node of the IBM Scalable Parallel system 2 (SP2), and a single R8000 processor of a Silicon Graphs POWER Challenge XL.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
BJORCK, A. 1990. Least squares methods. In Handbook of NumericalAnalysis, P. Ciarlet and J. Lions, Eds. Elsevier North-Holland, Inc., New York, NY, 465-652.
 
4
CARRIG, J. J. AND MEYER, G. G. 1996. Two tunable Householder QR decomposition algorithms. Tech. Rep. 96-14, Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD.
5
 
6
GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A.H. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. High Perform. Eng. 2, 1, 12-48.
 
7
GOLUB, G. AND WILKINSON, J. 1966. Note on the iterative refinement of least squares solutions. Numer. Math. 9, 139-148.
 
8
9
 
10
INTEL. 1995. Programmer's Reference Manual. Pentium Pro family developer's manual, vol. 2. Intel Corporation, Santa Clara, CA.
 
11
LAWSON, C. L. AND HANSON, R.J. 1974. Solving Least Squares Problems. Prentice-Hall, Inc., Upper Saddle River, NJ.
 
12
 
13
SILICON GRAPHICS. 1994. POWER CHALLENGE technical report. Silicon Graphics, Incorporated, Mountain View, CA.
 
14
WHITE, S. W. AND DHAWAN, S. 1995. POWER2: The next generation of the RISC System/ 6000 family. Draft prepared for PowerPC and POWER2: Technical Aspects of the New IBM RISC System/6000. IBM Corp., Riverton, NJ.
 
15

Collaborative Colleagues:
James J. Carrig, Jr.: colleagues
Gerard G. L. Meyer: colleagues