|
ABSTRACT
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is re-executed serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run-time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching; it detects at run-time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Abraham. Private communication, 1994.
|
 |
2
|
J. R. Allen , Ken Kennedy , Carrie Porterfield , Joe Warren, Conversion of control dependence to data dependence, Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, p.177-189, January 24-26, 1983, Austin, Texas
[doi> 10.1145/567067.567085]
|
| |
3
|
Allxant Computer 5gystems Corporation. ~"w~/Serles Architecture Manual, 1986.
|
| |
4
|
Alhant Computers Systems Corporation. Alliant FX/2800 Series System Descrtption, 1991.
|
 |
5
|
Karl J. Ottenstein , Robert A. Ballance , Arthur B. MacCabe, The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages, Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation, p.257-271, June 1990, White Plains, New York, United States
|
| |
6
|
|
| |
7
|
M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clemenu, S. Chin, D. Schneider, G. Fox, P. Messina, D. WaLker, C. Hsiung, J. Schwarzmemr, K. Lue, S. Orzag, E Seidl, O. Johnson, G. Swanson, R. Goodmm, and J. Martin. The PER- FECT club benchmarks: Effective performance evaluation of supercomputers. Technical Report CSRD-827, Center for Supercomputmg Research and Development, University of Illinois, Urbana, IL, May 1989.
|
| |
8
|
H. Berryman and J. Saltz. A manual for PARTI runtime primluves. Interim Report 90-13, ICASE, 1990.
|
| |
9
|
|
| |
10
|
M. Burke, R. Cytron, J. Ferrante, and W. Hsieh. Automatic generation of nested, fork-join parallehsm. Journal of Supercomputing, pages 71-88, 1989.
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
C. Kruskal. Efficient parallel algorithms for graph problems. In Proceedings of the 1985 Internattonal Conference on Parallel Processing, August 1985.
|
| |
17
|
C. Kruskal. Efficient parallel algorithms for graph problems. In Proceedings of the 1986 International Conference on Parallel Processing, pages 869-876, August 1986.
|
 |
18
|
D. J. Kuck , R. H. Kuhn , D. A. Padua , B. Leasure , M. Wolfe, Dependence graphs and compiler optimizations, Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.207-218, January 26-28, 1981, Williamsburg, Virginia
[doi> 10.1145/567532.567555]
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
I. Nudler and L. Rudolph. Tools for the efficient developementof efficient parallel programs. In Proc. 1st Israeli Conference on Computer System Engineering, 1988.
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
J. Sakz and R. Mirchandaney. The preprocessed doacross loop. In Dr. H.D. Schwetman, editor, Proceedings of the 1991 International Conference on Parallel Processmg, pages 174-178. CRC Press, Inc., 1991. Vol. II - Software.
|
 |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
P. Tu and D. Padua. Array privatizafion for shared and distributed memory machines. In Proceedings 2nd Workshop on Languages, Compilers, and Run-Time Envtronments for Distributed Memory Machines, September 1992.
|
| |
33
|
|
| |
34
|
Penn Tu and David Padua. GSA based demand-driven symbolic analysis. Technical Report t339, University of Illinois at Urbana- Champaign, Cntr for Supercomputing Res & Dev, February 1994.
|
| |
35
|
|
| |
36
|
|
| |
37
|
J. Wu, J. Saltz, S. Hlranandam, and H. Berryman. Runtime compilauon methods for multicomputers. In Dr. H.D. Schwetman, editor, Proceedings of the 1991 International Conference on Parallel Processing, pages 26-30. CRC Press, Inc., 1991. Vol. II - Software.
|
| |
38
|
|
 |
39
|
|
CITED BY 41
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
William Blume , Ramon Doallo , Rudolf Eigenmann , John Grout , Jay Hoeflinger , Thomas Lawrence , Jaejin Lee , David Padua , Yunheung Paek , Bill Pottenger , Lawrence Rauchwerger , Peng Tu, Parallel Programming with Polaris, Computer, v.29 n.12, p.78-82, December 1996
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
María Jesús Garzarán , Milos Prvulovic , José María Llabería , Víctor Viñals , Lawrence Rauchwerger , Josep Torrellas, Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors, ACM Transactions on Architecture and Code Optimization (TACO), v.2 n.3, p.247-279, September 2005
|
|
|
|
|
|
|
|
|
|
|
|
Arun Kejariwal , Xinmin Tian , Wei Li , Milind Girkar , Sergey Kozhukhov , Hideki Saito , Utpal Banerjee , Alexandru Nicolau , Alexander V. Veidenbaum , Constantine D. Polychronopoulos, On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Muthu Manikandan Baskaran , Nagavijayalakshmi Vydyanathan , Uday Kumar Reddy Bondhugula , J. Ramanujam , Atanas Rountev , P. Sadayappan, Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|