|
ABSTRACT
Supercompilers must reschedule computations defined by nested DO-loops in order to make an efficient use of supercomputer features (vector units, multiple elementary processors, cache memory, etc…). Many rescheduling techniques like loop interchange, loop strip-mining or rectangular partitioning have been described to speedup program execution. We present here a class of partitionings that encompasses previous techniques and provides enough flexibility to adapt code to multiprocessors with two levels of parallelism and two levels of memory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J.R. Allen and K. Kennedy, "PFC:: a Program to C(mvert Fortran to a Parallel Form," in Supercomputers, Design and Application, ed. K. Hwang (1982). COMPSAC, Tutorial
|
 |
2
|
|
| |
3
|
C. Ancourt, "Utilisation de syst~mes lin~aires sur Z pour la paralI~lisation de programmes," ENSMP-CAI-87-E87, Ecole des Mines de Paris, Fontainebleau (France) (1987).
|
 |
4
|
|
| |
5
|
|
| |
6
|
J. Davies, C. Huson, T. Macke, M. Wolfe, and B. Leasure, "The KAP/S-I: An Advznced Sourceto-Source Vectorizer-for the S-1 Mark IIa Supercomputer," lnt'l Conference on Parallel Processing, pp.833-835 (Aug. 1986).
|
| |
7
|
V. Dornic, "M~ithodes de partitionnement des r6seaux systoliques appliqudes ~ la triangularisation de matrice," Rapport de DEA, IRISA, Rennes (Juin 1987).
|
| |
8
|
R.J. Duffin, "On Fourier's Analysis of Linear Inequality Systems," Mathematical Programming Study 1, North-Holland (1974).
|
| |
9
|
D. Gannon, "Restructuring Nested Loops on the Alliant Cedar Cluster: A Case Study of Gaussian Elimination of Banded Matrices," CSRD document no. 543, University of llinois at Urbana Champaign (Feb. 1986).
|
| |
10
|
R. Cottlieb, K. Kimball, T. Jaskiewicz, and R. Swift, "A New Way To Speed Up a Supercompater," Electronics 58(30), pp.56-58 (July 1985).
|
| |
11
|
F. Irigoin and R. Triolet, "Supernodes and Alliant FX/8 Minisupercomputer," ENSMP-CAI-86-081, Ecole des Mines de Paris, Fontainebleau (France) (Aug. 1986).
|
| |
12
|
F. Irigoin and R. Triolet, "Computing Dependence Direction Vectors and Dependence Cones With Linear Systems," ENSMP-CAI-87-E94, Eicole des Mines de Paris, Fontainebleau (France) (,987).
|
| |
13
|
F. Irigoin, "Partitionnement des boucles imbriqudes : une technique d'optimisation pour les programmes scientifiques," Th~se de Doctorat d'Universit~, University! PARIS-VI, PARIS (1987).
|
| |
14
|
W. Jalby and U. Meier, "Optimizing Matrix Operation on a Parallel Multiprocessor with a Hierarchical Memory System," 1986 Int't Conference on Parallel Processing, pp.429-432 (Aug. 1986).
|
 |
15
|
|
 |
16
|
D. J. Kuck , R. H. Kuhn , D. A. Padua , B. Leasure , M. Wolfe, Dependence graphs and compiler optimizations, Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.207-218, January 26-28, 1981, Williamsburg, Virginia
[doi> 10.1145/567532.567555]
|
| |
17
|
|
 |
18
|
|
| |
19
|
A. Lichnewsky and F. Thoma,sset, "Techniques de Base sur l'Exploitation Automatique du Paralt~lisme dons tes Programmes," Calcul Parall~le h usage scientifique (Oct. 1985).
|
| |
20
|
|
| |
21
|
C. Mongenet, "Une m~thode de conception d'algorithmes systoliques. R~sultats thdoriques et rdaIisation.," Th~se de Doctorat, INPL, Nancy (t985).
|
| |
22
|
|
| |
23
|
J.-K. Peir and R. Cytron, "Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors," 1987 Int'l Conference on Parallel Processing, pp.217-225 (Aug. 1987).
|
| |
24
|
|
 |
25
|
Rémi Triolet , Francois Irigoin , Paul Feautrier, Direct parallelization of call statements, Proceedings of the 1986 SIGPLAN symposium on Compiler construction, p.176-185, June 25-27, 1986, Palo Alto, California, United States
|
| |
26
|
|
| |
27
|
M. J. Wolfe, "Advanced Loop interchanging," Int'l Conference on Parallel Processing, pp.536- 543 (Aug. lg86).
|
CITED BY 114
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M. Kandemir , A. Choudhary , J. Ramanujam , P. Banerjee, Improving locality using loop and data transformations in an integrated framework, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.285-297, November 1998, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
Somnath Ghosh , Margaret Martonosi , Sharad Malik, Automated cache optimizations using CME driven diagnosis, Proceedings of the 14th international conference on Supercomputing, p.316-326, May 08-11, 2000, Santa Fe, New Mexico, United States
|
|
|
Hiroshi Ohta , Yasuhiko Saito , Masahiro Kainaga , Hiroyuki Ono, Optimal tile size adjustment in compiling general DOACROSS loop nests, Proceedings of the 9th international conference on Supercomputing, p.270-279, July 03-07, 1995, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Maria Athanasaki , Aristidis Sotiropoulos , Georgios Tsoukalas , Nectarios Koziris, Pipelined scheduling of tiled nested loops onto clusters of SMPs using memory mapped network interfaces, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-13, November 16, 2002, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francisco Almeida , Rumen Andonov , Daniel Gonzalez , Luz M. Moreno , Vincent Poirriez , Casiano Rodriguez, Optimal tiling for the RNA base pairing problem, Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, August 10-13, 2002, Winnipeg, Manitoba, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
J. Ramanujam , Jinpyo Hong , Mahmut Kandemir , A. Narayan, Reducing memory requirements of nested loops for embedded systems, Proceedings of the 38th conference on Design automation, p.359-364, June 2001, Las Vegas, Nevada, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jürgen Teich , Lothar Thiele, Exact partitioning of affine dependence algorithms, Embedded processor design challenges: systems, architectures, modeling, and simulation-SAMOS, Springer-Verlag New York, Inc., New York, NY, 2002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mahmut Kandemir , Alok Choudhary , J. Ramanujam , Meenakshi A. Kandaswamy, A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations, IEEE Transactions on Parallel and Distributed Systems, v.11 n.7, p.648-668, July 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
William Thies , Michal Karczmarek , Janis Sermulins , Rodric Rabbah , Saman Amarasinghe, Teleport messaging for distributed stream programs, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
|
|
|
Arun Kejariwal , Alexandru Nicolau , Utpal Banerjee , Constantine D. Polychronopoulos, A novel approach for partitioning iteration spaces with variable densities, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
|
|
|
Prasad A. Kulkarni , Stephen R. Hines , David B. Whalley , Jason D. Hiser , Jack W. Davidson , Douglas L. Jones, Fast and efficient searches for effective optimization-phase sequences, ACM Transactions on Architecture and Code Optimization (TACO), v.2 n.2, p.165-198, June 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Youcef Bouchebaba , Bruno Girodias , Gabriela Nicolescu , El Mostapha Aboulhamid , Bruno Lavigueur , Pierre Paulin, MPSoC memory optimization using program transformation, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.12 n.4, p.43-es, September 2007
|
|
|
|
|
|
Arun Kejariwal , Alexandru Nicolau , Hideki Saito , Xinmin Tian , Milind Girkar , Utpal Banerjee , Constantine D. Polychronopoulos, A general approach for partitioning N-dimensional parallel nested loops with conditionals, Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, July 30-August 02, 2006, Cambridge, Massachusetts, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Robert Schreiber , Shail Aditya , Scott Mahlke , Vinod Kathail , B. Ramakrishna Rau , Darren Cronquist , Mukund Sivaraman, PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators, Journal of VLSI Signal Processing Systems, v.31 n.2, p.127-142, June 2002
|
|
|
|
|
|
|
|
|
Theodore Andronikos , Florina M. Ciorba , Panayiotis Theodoropoulos , Dimitrios Kamenopoulos , George Papakonstantinou, Cronus: A platform for parallel code generation based on computational geometry methods, Journal of Systems and Software, v.81 n.8, p.1389-1405, August, 2008
|
|
|
|
|
|
Jia Guo , Ganesh Bikshandi , Basilio B. Fraguela , Maria J. Garzaran , David Padua, Programming with tiles, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
DaeGon Kim , Lakshminarayanan Renganarayanan , Dave Rostron , Sanjay Rajopadhye , Michelle Mills Strout, Multi-level tiling: M for the price of one, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
|
|
|
|
|
|
Albert Hartono , Muthu Manikandan Baskaran , Cédric Bastoul , Albert Cohen , Sriram Krishnamoorthy , Boyana Norris , J. Ramanujam , P. Sadayappan, Parametric multi-level tiling of imperfectly nested loops, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
|
|