| Fast greedy weighted fusion |
| Full text |
Pdf
(832 KB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 14th international conference on Supercomputing
table of contents
Santa Fe, New Mexico, United States
Pages: 131 - 140
Year of Publication: 2000
ISBN:1-58113-270-0
|
|
Author
|
|
Ken Kennedy
|
Center for High Performance Software, Rice University
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 21, Citation Count: 15
|
|
|
ABSTRACT
Loop fusion is important to optimizing compilers because it is an important tool in managing the memory hierarchy. By fusing loops that use the same data elements, we can reduce the distance between accesses to the same datum and avoid costly cache misses. Unfortunately the problem of optimal loop fusion for reuse has been shown to be NP-hard, so compilers must resort to heuristics to avoid unreasonably long compile times. Greedy strategies are often excellent heuristics that produce high-quality solutions quickly. We present an algorithm for greedy weighted fusion, in which the heaviest edge (the one with the most reuse) is selected for possible fusion on each step. The algorithm is shown to be fast in the sense that it takes O(V(E+V)) time, which is arguably optimal for this problem.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
R. Allen and K. Kennedy. Advanced Compilation for High Performance Computers. Morgan Kauffman, to be published October 2000.
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
K. Kennedy and K. McKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report CRPC-TR94646, Center for Research on Parallel Computation, Rice University, 1994.
|
 |
16
|
|
CITED BY 15
|
|
|
|
|
|
Daniel Cociorva , Gerald Baumgartner , Chi-Chung Lam , P. Sadayappan , J. Ramanujam , Marcel Nooijen , David E. Bernholdt , Robert Harrison, Space-time trade-off optimization for a class of electronic structure calculations, ACM SIGPLAN Notices, v.37 n.5, May 2002
|
|
|
|
Qing Yi , Ken Kennedy , Haihang You , Keith Seymour , Jack Dongarra, Automatic blocking of QR and LU factorizations for locality, Proceedings of the 2004 workshop on Memory system performance, June 08-08, 2004, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
D. Cociorva , J. W. Wilkins , C. Lam , G. Baumgartner , J. Ramanujam , P. Sadayappan, Loop optimization for a class of memory-constrained computations, Proceedings of the 15th international conference on Supercomputing, p.103-113, June 2001, Sorrento, Italy
|
|
|
|
|
|
|
|
Youcef Bouchebaba , Bruno Girodias , Gabriela Nicolescu , El Mostapha Aboulhamid , Bruno Lavigueur , Pierre Paulin, MPSoC memory optimization using program transformation, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.12 n.4, p.43-es, September 2007
|
|
|
|
|
|
|