|
ABSTRACT
Harnessing the hardware parallelism of the emerging multi-cores systems necessitates concurrent software. Unfortunately, most of the existing mainstream software is sequential in nature. Although one could auto-parallelize a given program, the efficacy of this is largely limited to floating-point codes. One of the ways to alleviate the above limitation is to parallelize programs, which cannot be auto-parallelized, via explicit synchronization. In this regard, efficient placement of the synchronization primitives - say, post, wait - plays a key role in achieving high degree of thread-level parallelism (TLP). In this paper, we propose novel compiler techniques for the above. Specifically, given a control flow graph (CFG), the proposed techniques place a post as early as possible and place a wait as late as possible in the CFG, subject to dependences. We demonstrate the efficacy of our techniques, on a real machine, using real codes, specifically, from the industry-standard SPEC CPU benchmarks, the Linux kernel and other widely used open source codes. Our results show that the proposed techniques yield significantly higher levels of TLP than the state-of-the-art.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
James R. Goodman , Mary K. Vernon , Philip J. Woest, Efficient synchronization primitives for large-scale cache-coherent multiprocessors, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.64-75, April 03-06, 1989, Boston, Massachusetts, United States
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Wel Li , Utpal Banerjee , Alexandru Nicolau , Constantine D. Polychronopoulos, Lightweight lock-free synchronization methods for multithreading, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
[doi> 10.1145/1183401.1183452]
|
| |
12
|
The Linux Kernel Archives. http://www.kernel.org.
|
| |
13
|
|
| |
14
|
|
| |
15
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
SPEC CPU Benchmarks. http://www.spec.org/benchmarks.html.
|
 |
20
|
Arun Kejariwal , Xinmin Tian , Wei Li , Milind Girkar , Sergey Kozhukhov , Hideki Saito , Utpal Banerjee , Alexandru Nicolau , Alexander V. Veidenbaum , Constantine D. Polychronopoulos, On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
[doi> 10.1145/1183401.1183407]
|
| |
21
|
|
| |
22
|
SPEC CPU2006. http://www.spec.org/cpu2006.
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
SPEC CINT2006. http://www.spec.org/cpu2006/CINT2006.
|
| |
27
|
|
| |
28
|
A. Nicolau. Percolation scheduling. In Proceedings of the 1985 International Conference on Parallel Processing, August 1985.
|
| |
29
|
|
| |
30
|
|
| |
31
|
SPEC CPU2000. http://www.spec.org/cpu2000.
|
| |
32
|
Sendmail. http://www.sendmail.org/.
|
| |
33
|
Apache. http://download.nextag.com/apache.
|
| |
34
|
D. A. Padua. Multiprocessors: Discussion of theoritical and practical problems. Technical Report 79-990, Department of Computer Science, University of Illinois at Urbana-Champaign, November 1979.
|
| |
35
|
J. Davies. Parallel loop constructs for multiprocessors. Technical Report 81-1070, Department of Computer Science, University of Illinois at Urbana-Champaign, May 1981.
|
| |
36
|
C. Zhu and P. Yew. A synchronization scheme and its applications for large scale multiprocessors. In Proceedings of the Conference on Distributed Computing Systems, pages 486--491, San Francisco, CA, May 1984.
|
 |
37
|
|
| |
38
|
|
 |
39
|
|
 |
40
|
Weirong Zhu , Vugranam C Sreedhar , Ziang Hu , Guang R. Gao, Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
 |
41
|
John Whaley , Martin Rinard, Compositional pointer and escape analysis for Java programs, Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, p.187-206, November 01-05, 1999, Denver, Colorado, United States
|
 |
42
|
|
| |
43
|
R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, pages 836--844, St. Charles, IL, August 1986.
|
| |
44
|
S. Midkiff and D. Padua. Compiler generated synchronization for DO loops. In Proceedings of the 1986 International Conference on Parallel Processing, pages 544--551, St. Charles, IL, August 1986.
|
| |
45
|
H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A compilation scheme for macro-dataow computation on hierarchical multiprocessor systems. In Proceedings of the International Conference on Parallel Processing, pages II294--II295, Urbana-Champaign, IL, August 1990.
|
| |
46
|
|
 |
47
|
R. Cytron , M. Hind , W. Hsieh, Automatic generation of DAG parallelism, Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation, p.54-68, June 19-23, 1989, Portland, Oregon, United States
|
 |
48
|
|
 |
49
|
Chen Tian , Vijay Nagarajan , Rajiv Gupta , Sriraman Tallam, Dynamic recognition of synchronization operations for improved data race detection, Proceedings of the 2008 international symposium on Software testing and analysis, July 20-24, 2008, Seattle, WA, USA
[doi> 10.1145/1390630.1390649]
|
CITED BY
|
|
Alexandru Nicolau , Guangqiang Li , Alexander V. Veidenbaum , Arun Kejariwal, Synchronization optimizations for efficient execution on multi-cores, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|