|
ABSTRACT
Current parallelizing compilers cannot identify a significant fraction of fully parallel loops because they have complex or statically insufficiently defined access patterns. For this reason, we have developed the Privatizing DOALL test—a technique for identifying fully parallel loops at run-time, and dynamically privatizing scalars and arrays. The test itself is fully parallel, and can be applied to any loop, regardless of the structure of its data and/or control flow. The technique can be utilized in two modes: (i) the test is performed before executing the loop and indicates whether the loop can be executed as a DOALL; (ii) speculatively—the loop and the test are executed simultaneously, and it is determined later if the loop was in fact parallel. The test can also be used for debugging parallel programs. We discuss how the test can be inserted automatically by the compiler and outline a cost/performance analysis that can be performed to decide when to use the test. Our conclusion is that the test should almost always be applied—because, as we show, the expected speedup for fully parallel loops is significant, and the cost of a failed test (a not fully parallel loop), is minimal. We present some experimental results on loops from the PERFECT Benchmarks which confirm our conclusion that this test can lead to significant speedups.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alliant Computer Systems Corporation. 42 Nagog Park, Acton, Massachusetts 01720. FX/Series Architecture Manual, 1986. Part Number: 300-00001-B.
|
| |
2
|
Alliant Computers Systems Corporation. AlIiant FX/2800 Series System Description, 1991.
|
| |
3
|
|
| |
4
|
M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orzag, F. Seidl, O. Johnson, G. Swanson, R. Goodrum, and J. Martin. The PERFECT club benchmarks: Effective performance evaluation of supercomputers. Technical Report CSRD-827, Center for Supercomputing Research and Development, University of Illinois, Urbana, IL, May 1989.
|
| |
5
|
H. Berryman and J. Saltz. A manual for PARTI runtime primitives. Interim Report 90-13, ICASE, 1990.
|
| |
6
|
|
| |
7
|
M. Burke, R. Cytron, J. Ferrante, and W. Hsieh. Automatic generation of nested, fork-join parallelism. Journal of Supercomputing, pages 71-88, 1989.
|
| |
8
|
R. Eigenmann and W. Blume. An effectiveness study of parallelizing compiler techniques. In Proc. of the i991 Int'l. Conf. on Parallel Processing, pages 17-25, August 1991.
|
| |
9
|
|
| |
10
|
|
 |
11
|
D. J. Kuck , R. H. Kuhn , D. A. Padua , B. Leasure , M. Wolfe, Dependence graphs and compiler optimizations, Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.207-218, January 26-28, 1981, Williamsburg, Virginia
[doi> 10.1145/567532.567555]
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
J. Saltz and R. Mirchandaney. The preprocessed doacross loop. In Dr. H.D. Schwetman, editor, Proc. of the 1991 Int'l. Conf. on Parallel Processing, St. Charles, Illinois, August 12- 16, pages 174-178. CRC Press, Inc., 1991. Vol. II - Software.
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
R.M. Tomasulo. An efficient algorithm for exploitingmultipIe arithmetic units. IBM Journal of Research and Development, 11:25-33, January 1967.
|
| |
25
|
P. Tu and D. Padua. Array privatization for shared and distributed memory machines. In Proc. 2nd Workshop on Languages, Compilers, and Run-Tzme Environments.for Distributed Memory Machines, September 1992.
|
| |
26
|
|
| |
27
|
|
| |
28
|
Wu, J. Saltz, S. Hiranandani, and H. Berryman. Runtime compilation methods for multicomputers. In Dr. H.D. Schwetman, editor, Proc. of the 1991 Int'l. Conf. on Parallel Processing, St. Charles, Illinois, August 1~-16, pages 26-30. CRC Press, Inc., 1991. Vol. II- Software.
|
| |
29
|
|
 |
30
|
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
Esther Salamí , Jesús Corbal , Carlos Álvarez , Mateo Valero, Cost effective memory disambiguation for multimedia codes, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, October 08-11, 2002, Grenoble, France
|
|
|
|
|
|
William Blume , Rudolf Eigenmann , Jay Hoeflinger , David Padua , Paul Petersen , Lawrence Rauchwerger , Peng Tu, Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing, IEEE Parallel & Distributed Technology: Systems & Technology, v.2 n.3, p.37-47, September 1994
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|