ACM Home Page
Please provide us with feedback. Feedback
Chunking parallel loops in the presence of synchronization
Full text PdfPdf (571 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 23rd international conference on Supercomputing table of contents
Yorktown Heights, NY, USA
SESSION: Compilers table of contents
Pages 181-192  
Year of Publication: 2009
ISBN:978-1-60558-498-0
Authors
Jun Shirako  Rice University, Houston, TX, USA
Jisheng M. Zhao  Rice University, Houston, TX, USA
V. Krishna Nandivada  IBM India Research Laboratory, Bangalore, India
Vivek N. Sarkar  Rice University, Houston, TX, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 79,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542275.1542304
What is a DOI?

ABSTRACT

Modern languages for shared-memory parallelism are moving from a bulk-synchronous Single Program Multiple Data (SPMD) execution model to lightweight Task Parallel execution models for improved productivity. This shift is intended to encourage programmers to express the ideal parallelism in an application at a fine granularity that is natural for the underlying domain, while delegating to the compiler and runtime system the job of extracting coarser-grained useful parallelism for a given target system. A simple and important example of this separation of concerns between ideal and useful parallelism can be found in chunking of parallel loops, where the programmer expresses ideal parallelism by declaring all iterations of a loop to be parallel and the implementation exploits useful parallelism by executing iterations of the loop in sequential chunks.

Though chunking of parallel loops has been used as a standard transformation for several years, it poses some interesting challenges when the parallel loop may directly or indirectly (via procedure calls) perform synchronization operations such as barrier, signal or wait statements. In such cases, a straightforward transformation that attempts to execute a chunk of loops in sequence in a single thread may violate the semantics of the original parallel program. In this paper, we address the problem of chunking parallel loops that may contain synchronization operations. We present a transformation framework that uses a combination of transformations from past work (e.g., loop strip-mining, interchange, distribution, unswitching) to obtain an equivalent set of parallel loops that chunk together statements from multiple iterations while preserving the semantics of the original parallel program. These transformations result in reduced synchronization and scheduling overheads, thereby improving performance and scalability. Our experimental results for 11 benchmark programs on an UltraSPARC II multicore processor showed a geometric mean speedup of 0.52x for the unchunked case and 9.59x for automatic chunking using the techniques described in this paper. This wide gap underscores the importance of using these techniques in future compiler and runtime systems for programming models with lightweight parallelism.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Allan et al. The Fortress language specification version 0.618. Technical report, Sun Microsystems, April 2005.
 
2
R. Barik et al. Experiences with an SMP implementation for X10 based on the Java concurrency utilities. In Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism, Seattle, Washington, 2006.
3
4
5
 
6
F. Darema et al. A Single-Program-Multiple-Data Computational model for EPEX/FORTRAN. Parallel Computing, 7(1):11--24, 1988.
 
7
S. Deitz. Parallel programming in chapel. http://www.cct.lsu.edu/ estrabd/LACSI2006/Programming 2006.
8
9
 
10
B. Goetz. Java Concurrency In Practice. Addison-Wesley, 2007.
11
 
12
 
13
Cray Inc. The Chapel language specification version 0.4. Technical report, Cray Inc., February 2005.
 
14
 
15
 
16
 
17
OpenMP Application Program Interface, version 3.0, May 2008. http://www.openmp.org/mp-documents/spec30.pdf.
 
18
19
20
21
 
22
R. Vallée-Rai et al. Soot-a Java Optimization Framework. In Proceedings of CASCON 1999, pages 125--135, 1999.
 
23
 
24
X10 release on SourceForge. http://x10.sf.net.
25

Collaborative Colleagues:
Jun Shirako: colleagues
Jisheng M. Zhao: colleagues
V. Krishna Nandivada: colleagues
Vivek N. Sarkar: colleagues