|
ABSTRACT
An elementary, machine-independent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimrd code, tracking hand-coded BLAS3 routines. Proof of concept is demonstrated by racing the in-place algorithm against manufacturer's hand-tuned BLAS3 routines; it can win.The recursive code bifurcates naturally at the top level into independent block-oriented processes, that each writes to a disjoint and contiguous region of memory. Experience has shown that the indexing vastly improves the patterns of memory access at all levels of the memory hierarchy, independently of the sizes of caches or pages and without ad hoc programming. It also exposed a weakness in SGI's C compilers that merrily unroll loops for the super-scalar R8000 processor, but do not analogously unfold the base cases of the most elementary recursions. Such deficiencies might deter future programmers from using this rich class of recursive algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
F.W. Burton & J. G. Kollias. Comment on 'The explicit quad tree as a structure for computer graphics.' Comput. J. 26, 2 (May 1983), 188.
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
J. Cohen & M. Roth. On the implementation of Strassen's fast multiplication algorithm. Acta Informat. 6, 4 (August 1976), 341-355.
|
 |
8
|
David E. Culler , Richard M. Karp , David Patterson , Abhijit Sahay , Eunice E. Santos , Klaus Erik Schauser , Ramesh Subramonian , Thorsten von Eicken, LogP: a practical model of parallel computation, Communications of the ACM, v.39 n.11, p.78-85, Nov. 1996
[doi> 10.1145/240455.240477]
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
 |
12
|
|
| |
13
|
J. Frens & D. S. Wise. Matrix inversion using quadtrees implemented in GOFER. Technical Report 433, Computer Science Dept., Indiana University (May 1995).
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
J. Spiess. Untersuchungen des Zeitgewinns durch neue Algorithmen zur Matrix-Multiplikation. Computing 17, 1 (1976), 23-36.
|
| |
21
|
V. Strassen. Gaussian elimination is not optimal. Numer. Math. 13 (1969), 354-356.
|
 |
22
|
|
 |
23
|
|
| |
24
|
D. S. Wise. Undulant block elimination and integerpreserving matrix inversion. Sci. Comput. Programming (to appear). Technical Report 418, Computer Science Department, Indiana University (revised, August 1995).
|
CITED BY 31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Siddhartha Chatterjee , Alvin R. Lebeck , Praveen K. Patnala , Mithuna Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.222-231, June 27-30, 1999, Saint Malo, France
|
|
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H.-W. Loidl , F. Rubio , N. Scaife , K. Hammond , S. Horiguchi , U. Klusik , R. Loogen , G. J. Michaelson , R. Peña , S. Priebe , Á J. Rebón , P. W. Trinder, Comparing Parallel Functional Languages: Programming and Performance, Higher-Order and Symbolic Computation, v.16 n.3, p.203-251, September 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
F.
Theory of Computation
F.2
ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
F.2.1
Numerical Algorithms and Problems
Subjects:
Computations on matrices
Additional Classification:
D.
Software
D.4
OPERATING SYSTEMS
E.
Data
E.1
DATA STRUCTURES
Subjects:
Arrays
G.
Mathematics of Computing
G.1
NUMERICAL ANALYSIS
G.1.3
Numerical Linear Algebra
Subjects:
Linear systems (direct and iterative methods)
G.4
MATHEMATICAL SOFTWARE
Subjects:
Algorithm design and analysis
General Terms:
Algorithms,
Design,
Experimentation,
Measurement,
Performance,
Theory
Keywords:
cache misses,
indexing,
paging,
quadtrees,
storage management,
swapping
|