ACM Home Page
Please provide us with feedback. Feedback
Techniques for efficient placement of synchronization primitives
Full text PdfPdf (617 KB)
Source
Principles and Practice of Parallel Programming archive
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming table of contents
Raleigh, NC, USA
SESSION: Parallel compilers and tools table of contents
Pages 199-208  
Year of Publication: 2009
ISBN:978-1-60558-397-6
Also published in ...
Authors
Alexandru Nicolau  University of California at Irvine, Irvine, California, USA
Guangqiang Li  University of California at Irvine, Irvine, California, USA
Arun Kejariwal  Yahoo! Inc, Santa Clara, California, USA
Sponsors
ACM: Association for Computing Machinery
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 203,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1504176.1504207
What is a DOI?

ABSTRACT

Harnessing the hardware parallelism of the emerging multi-cores systems necessitates concurrent software. Unfortunately, most of the existing mainstream software is sequential in nature. Although one could auto-parallelize a given program, the efficacy of this is largely limited to floating-point codes. One of the ways to alleviate the above limitation is to parallelize programs, which cannot be auto-parallelized, via explicit synchronization. In this regard, efficient placement of the synchronization primitives - say, post, wait - plays a key role in achieving high degree of thread-level parallelism (TLP). In this paper, we propose novel compiler techniques for the above. Specifically, given a control flow graph (CFG), the proposed techniques place a post as early as possible and place a wait as late as possible in the CFG, subject to dependences. We demonstrate the efficacy of our techniques, on a real machine, using real codes, specifically, from the industry-standard SPEC CPU benchmarks, the Linux kernel and other widely used open source codes. Our results show that the proposed techniques yield significantly higher levels of TLP than the state-of-the-art.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
5
6
7
 
8
 
9
10
11
 
12
The Linux Kernel Archives. http://www.kernel.org.
 
13
 
14
 
15
16
17
18
 
19
SPEC CPU Benchmarks. http://www.spec.org/benchmarks.html.
20
 
21
 
22
SPEC CPU2006. http://www.spec.org/cpu2006.
 
23
24
 
25
 
26
SPEC CINT2006. http://www.spec.org/cpu2006/CINT2006.
 
27
 
28
A. Nicolau. Percolation scheduling. In Proceedings of the 1985 International Conference on Parallel Processing, August 1985.
 
29
 
30
 
31
SPEC CPU2000. http://www.spec.org/cpu2000.
 
32
Sendmail. http://www.sendmail.org/.
 
33
Apache. http://download.nextag.com/apache.
 
34
D. A. Padua. Multiprocessors: Discussion of theoritical and practical problems. Technical Report 79-990, Department of Computer Science, University of Illinois at Urbana-Champaign, November 1979.
 
35
J. Davies. Parallel loop constructs for multiprocessors. Technical Report 81-1070, Department of Computer Science, University of Illinois at Urbana-Champaign, May 1981.
 
36
C. Zhu and P. Yew. A synchronization scheme and its applications for large scale multiprocessors. In Proceedings of the Conference on Distributed Computing Systems, pages 486--491, San Francisco, CA, May 1984.
37
 
38
39
40
41
42
 
43
R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, pages 836--844, St. Charles, IL, August 1986.
 
44
S. Midkiff and D. Padua. Compiler generated synchronization for DO loops. In Proceedings of the 1986 International Conference on Parallel Processing, pages 544--551, St. Charles, IL, August 1986.
 
45
H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A compilation scheme for macro-dataow computation on hierarchical multiprocessor systems. In Proceedings of the International Conference on Parallel Processing, pages II294--II295, Urbana-Champaign, IL, August 1990.
 
46
47
48
49


Collaborative Colleagues:
Alexandru Nicolau: colleagues
Guangqiang Li: colleagues
Arun Kejariwal: colleagues