|
ABSTRACT
The Opie Project aims to develop a compiler to transform C codes written for row-major matrix representation into equivalent codes for Morton-order matrix representation, and to apply its techniques to other languages. Accepting a possible reduction in performance we seek to compile a library of usable code to support future development of new algorithms better suited to Morton-ordered matrices.This paper reports the formalism behind the OPIE compiler for C, its status: now compiling several standard Level-2 and Level-3 linear algebra operations, and a demonstration of a breakthrough reflected in a huge reduction of L1, L2, TLB misses. Overall performance improves on the Intel Xeon architecture.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Adamczyk, S., Spicer, J., and Vandevoorde, D. Edison Design Group: Compiler front ends for the OEM market, 2002. Visited Nov. 2002. http://www.edg.com/
|
| |
2
|
|
| |
3
|
Allen, F. E., Cocke, J., and Kennedy, K. Reduction of operator strength. In Program Flow Analysis: Theory and Applications, S. W. Muchnick and N. D. Jones, Eds. Prentice-Hall, Englewood Cliffs, NJ, 1981, ch. 3.2, pp. 79--101.
|
 |
4
|
|
 |
5
|
Jeff Bilmes , Krste Asanovic , Chee-Whye Chin , Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Proceedings of the 11th international conference on Supercomputing, p.340-347, July 07-11, 1997, Vienna, Austria
[doi> 10.1145/263580.263662]
|
 |
6
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305231]
|
| |
7
|
|
| |
8
|
Cragon, H. G. A historical note on binary tree. SIGARCH Comput. Archit. News 18, 4 (Dec. 1990), 3.
|
 |
9
|
David E. Culler , Richard M. Karp , David Patterson , Abhijit Sahay , Eunice E. Santos , Klaus Erik Schauser , Ramesh Subramonian , Thorsten von Eicken, LogP: a practical model of parallel computation, Communications of the ACM, v.39 n.11, p.78-85, Nov. 1996
[doi> 10.1145/240455.240477]
|
| |
10
|
Elmroth, E., and Gustavson, F. Applying recursion to serial and parallel QR factorization leads to better performance. IBM J. Res. Develop. 44, 4 (July 2000), 605--624. http://www.research.ibm.com/journal/rd/444/elmroth.html
|
| |
11
|
Elmroth, E., Gustavson, F., Jonsson, I., and Kgström, B. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Rev. 46, 1 (Mar. 2004), 3--45. http://epubs.siam.org/sam-bin/dbq/article/42869
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
Goto, K., and Van De Geijn, R. On reducing TLB misses in matrix multiplication. FLAME Working Note 9, Univ. of Texas, Austin, Nov. 2002. http://www.cs.utexas.edu/users/flame/pubs/GOTO.ps.gz
|
| |
17
|
INTEL CORP. Intel Math Kernel Library. Santa Clara, CA, 2003. http://www.intel.com/software/products/mk1/
|
| |
18
|
Morton, G. M. A computer oriented geodetic data base and a new technique in file sequencing. Tech. rep., IBM Ltd., Ottawa, Ontario, Mar. 1966.
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Tocher, K. D. The application of automatic computers to sampling experiments. J. Roy. Statist. Soc. Ser. B 16, 1 (1954), 39--61. See pp. 53--55.
|
| |
23
|
|
| |
24
|
|
| |
25
|
Wise, D. S., Citro, C., and Rainey, M. Parallel programming with Morton-ordered matrices. In preparation, 2003.
|
 |
26
|
|
|