|
ABSTRACT
Positive results from new object-oriented tools for scientific programming are reported. Using template classes, abstractions of matrix representations are available that subsume conventional row-major, column-major, either Z- or И-Morton-order, as well as block-wise combinations of these. Moreover, the design of the Matrix Template Library (MTL) has been independently extended to provide recursators, to support block-recursive algorithms, supplementing MTL's iterators. Data types modeling both concepts enable the programmer to implement both iterative and recursive algorithms (or even both) on all of the aforementioned matrix representations at once for a wide family of important scientific operations. We illustrate the unrestricted applicability of our matrix-recursator on matrix multiplication. The same generic block-recursive function, unaltered, is instantiated on different triplets of matrix types. Within a base block, either a library multiplication or a user-provided, platform-specific code provides excellent performance. We achieve 77% of peak-performance using hand-tuned base cases without explicit prefetching. This excellent performance becomes available over a wide family of matrix representations from a single program. The techniques generalize to other applications in linear algebra.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Advanced Micro Devices, Inc., Sunnyvale, CA. AMD Core Math Library (ACML), 2006. http://developer.amd.com/acml.jsp
|
| |
4
|
P. Bientinesi, B. Gunter, and R. A. van de Geijn. Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw., 2007. Accepted upon revisions. http://www.cs.utexas.edu/users/pauldj/pubs/TOMS_SPD.pdf
|
 |
5
|
Siddhartha Chatterjee , Alvin R. Lebeck , Praveen K. Patnala , Mithuna Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.222-231, June 27-30, 1999, Saint Malo, France
[doi> 10.1145/305619.305645]
|
| |
6
|
|
| |
7
|
J. J. Dongarra, H. W. Meuer, E. Strohmaier, and H. Simon. Top 500 supercomputer sites. http://www.top500.org, 2006.
|
| |
8
|
J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart. LINPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, 1979.
|
 |
9
|
|
| |
10
|
|
| |
11
|
K. Goto and R. A. van de Geijn. Anatomy of high-performance matrix multiplication. Technical report, Univ. of Texas, Austin. Submittted for publication. Visited Sept. 2006. http://www.cs.utexas.edu/users/flame/pubs/GOTO_TOMS.pdf
|
| |
12
|
P. Gottschling. Fundamental algebraic concepts in concept-enabled C++. Technical Report 638, Indiana University, Oct. 2006. http://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum=TR638
|
 |
13
|
|
| |
14
|
A. Lumsdaine, J. Siek, L.-Q. Lee, and P. Gottschling. The Matrix Template Library home page. http://ww.osl.iu.edu/research/mtl, 2006.
|
| |
15
|
G. M. Morton. A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., Ottawa, Ontario, Mar. 1966.
|
| |
16
|
|
| |
17
|
|
| |
18
|
R. Raman and D. S. Wise. Converting to and from dilated integers. Submitted for publication, Dec. 2006. http://www.cs.indiana.edu/~dswise/Arcee/castingDilated-comb.pdf
|
| |
19
|
|
| |
20
|
|
| |
21
|
J. Spieß. Untersuchungen des Zeitgewinns durch neue Algorithmen zur Matrix-Multiplication. Computing, 17:23--36, 1976.
|
| |
22
|
A. Stepanov. The Standard Template Library --- how do you build an algorithm that is both generic and efficient? Byte Magazine, 20(10), Oct. 1995.
|
| |
23
|
|
| |
24
|
R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1--2):3--35, Jan. 2001. http://dx.doi.org/10.1016/S0167-8191(00)00087-9
|
 |
25
|
|
|