ACM Home Page
Please provide us with feedback. Feedback
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors
Full text PdfPdf (545 KB)
Source
Principles and Practice of Parallel Programming archive
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming table of contents
Raleigh, NC, USA
SESSION: Parallel compilers and tools table of contents
Pages 219-228  
Year of Publication: 2009
ISBN:978-1-60558-397-6
Also published in ...
Authors
Muthu Manikandan Baskaran  The Ohio state University, Columbus, OH, USA
Nagavijayalakshmi Vydyanathan  The Ohio state University, Columbus, OH, USA
Uday Kumar Reddy Bondhugula  The Ohio state University, Columbus, OH, USA
J. Ramanujam  Louisiana State University, Baton Rouge, LA, USA
Atanas Rountev  The Ohio state University, Columbus, OH, USA
P. Sadayappan  The Ohio state University, Columbus, OH, USA
Sponsors
ACM: Association for Computing Machinery
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 37,   Downloads (12 Months): 263,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1504176.1504209
What is a DOI?

ABSTRACT

Recent advances in polyhedral compilation technology have made it feasible to automatically transform affine sequential loop nests for tiled parallel execution on multi-core processors. However, for multi-statement input programs with statements of different dimensionalities, such as Cholesky or LU decomposition, the parallel tiled code generated by existing automatic parallelization approaches may suffer from significant load imbalance, resulting in poor scalability on multi-core systems. In this paper, we develop a completely automatic parallelization approach for transforming input affine sequential codes into efficient parallel codes that can be executed on a multi-core system in a load-balanced manner. In our approach, we employ a compile-time technique that enables dynamic extraction of inter-tile dependences at run-time, and dynamic scheduling of the parallel tiles on the processor cores for improved scalable execution. Our approach obviates the need for programmer intervention and re-writing of existing algorithms for efficient parallel execution on multi-cores. We demonstrate the usefulness of our approach through comparisons using linear algebra computations: LU and Cholesky decomposition.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
C. Bastoul, A. Cohen, S. Girbal, S. Sharma, and O. Temam. Putting polyhedral loop transformations to work. In Workshop on Languages and Compilers for Parallel Computing (LCPC'03), pages 23--30, 2003.
 
6
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Affine transformations for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences. Technical Report OSU-CISRC-5/07-TR43, Ohio State University, May 2007.
 
7
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International Conference on Compiler Construction (ETAPS CC), Apr. 2008.
8
 
9
U. Bondhugula, J. Ramanujam, and P. Sadayappan. Pluto: A practical and fully automatic polyhedral parallelizer and locality optimizer. Technical Report OSU-CISRC-10/07-TR70, The Ohio State University, Oct. 2007.
 
10
 
11
A. Buttari, J. Dongarra, P. Husbands, J. Kurzak, and K. Yelick. Multithreading for synchronization tolerance in matrix factorization. In Proceedings of the SciDAC 2007 Conference. Journal of Physics: Conference Series, 2007.
 
12
A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A class of parallel tiled linear algebra algorithms for multicore architectures. Technical Report UT-CS-07-600, Innovative Computing Laboratory, University of Tennessee Knoxville, September 2007. Submitted to Parallel Computing. LAPACK Working Note 191.
 
13
14
 
15
CLooG: The Chunky Loop Generator. http://www.cloog.org.
 
16
A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379--392, 1997.
 
17
 
18
J. Dongarra. Four important concepts that will effect math software. In 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), 2008.
 
19
P. Feautrier. Dataflow analysis of array and scalar references. IJPP, 20(1):23--53, 1991.
 
20
 
21
 
22
 
23
 
24
 
25
M. Griebl. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. FMI, University of Passau, 2004. Habilitation Thesis.
 
26
27
28
 
29
30
31
 
32
 
33
Parallel linear algebra for scalable multi-core architectures (PLASMA) project. http://icl.cs.utk.edu/plasma.
 
34
PLUTO: A polyhedral automatic parallelizer and locality optimizer for multicores. http://pluto-compiler.sourceforge.net.
35
36
 
37
38
39
 
40
P. Rundberg and P. S. Om. Low-cost thread-level data dependence speculation on multiprocessors. In In Fourth Workshop on Multithreaded Execution, Architecture and Compilation, pages 1--9, 2000.
41
 
42
 
43
 
44
45
 
46
N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In International Conference on Compiler Construction (ETAPS CC'06), pages 185--201, Mar. 2006.
47
 
48

Collaborative Colleagues:
Muthu Manikandan Baskaran: colleagues
Nagavijayalakshmi Vydyanathan: colleagues
Uday Kumar Reddy Bondhugula: colleagues
J. Ramanujam: colleagues
Atanas Rountev: colleagues
P. Sadayappan: colleagues