ACM Home Page
Please provide us with feedback. Feedback
The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization
Full text PdfPdf (1.27 MB)
Source International Conference on Supercomputing archive
Proceedings of the 8th international conference on Supercomputing table of contents
Manchester, England
Pages: 33 - 43  
Year of Publication: 1994
ISBN:0-89791-665-4
Authors
Lawrence Rauchwerger  Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 1308 W. Main St., Urbana, IL
David Padua  Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 1308 W. Main St., Urbana, IL
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 10,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/181181.181254
What is a DOI?

ABSTRACT

Current parallelizing compilers cannot identify a significant fraction of fully parallel loops because they have complex or statically insufficiently defined access patterns. For this reason, we have developed the Privatizing DOALL test—a technique for identifying fully parallel loops at run-time, and dynamically privatizing scalars and arrays. The test itself is fully parallel, and can be applied to any loop, regardless of the structure of its data and/or control flow. The technique can be utilized in two modes: (i) the test is performed before executing the loop and indicates whether the loop can be executed as a DOALL; (ii) speculatively—the loop and the test are executed simultaneously, and it is determined later if the loop was in fact parallel. The test can also be used for debugging parallel programs. We discuss how the test can be inserted automatically by the compiler and outline a cost/performance analysis that can be performed to decide when to use the test. Our conclusion is that the test should almost always be applied—because, as we show, the expected speedup for fully parallel loops is significant, and the cost of a failed test (a not fully parallel loop), is minimal. We present some experimental results on loops from the PERFECT Benchmarks which confirm our conclusion that this test can lead to significant speedups.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Alliant Computer Systems Corporation. 42 Nagog Park, Acton, Massachusetts 01720. FX/Series Architecture Manual, 1986. Part Number: 300-00001-B.
 
2
Alliant Computers Systems Corporation. AlIiant FX/2800 Series System Description, 1991.
 
3
 
4
M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orzag, F. Seidl, O. Johnson, G. Swanson, R. Goodrum, and J. Martin. The PERFECT club benchmarks: Effective performance evaluation of supercomputers. Technical Report CSRD-827, Center for Supercomputing Research and Development, University of Illinois, Urbana, IL, May 1989.
 
5
H. Berryman and J. Saltz. A manual for PARTI runtime primitives. Interim Report 90-13, ICASE, 1990.
 
6
 
7
M. Burke, R. Cytron, J. Ferrante, and W. Hsieh. Automatic generation of nested, fork-join parallelism. Journal of Supercomputing, pages 71-88, 1989.
 
8
R. Eigenmann and W. Blume. An effectiveness study of parallelizing compiler techniques. In Proc. of the i991 Int'l. Conf. on Parallel Processing, pages 17-25, August 1991.
 
9
 
10
11
 
12
13
14
 
15
 
16
17
 
18
 
19
J. Saltz and R. Mirchandaney. The preprocessed doacross loop. In Dr. H.D. Schwetman, editor, Proc. of the 1991 Int'l. Conf. on Parallel Processing, St. Charles, Illinois, August 12- 16, pages 174-178. CRC Press, Inc., 1991. Vol. II - Software.
20
 
21
22
 
23
 
24
R.M. Tomasulo. An efficient algorithm for exploitingmultipIe arithmetic units. IBM Journal of Research and Development, 11:25-33, January 1967.
 
25
P. Tu and D. Padua. Array privatization for shared and distributed memory machines. In Proc. 2nd Workshop on Languages, Compilers, and Run-Tzme Environments.for Distributed Memory Machines, September 1992.
 
26
 
27
 
28
Wu, J. Saltz, S. Hiranandani, and H. Berryman. Runtime compilation methods for multicomputers. In Dr. H.D. Schwetman, editor, Proc. of the 1991 Int'l. Conf. on Parallel Processing, St. Charles, Illinois, August 1~-16, pages 26-30. CRC Press, Inc., 1991. Vol. II- Software.
 
29
30

CITED BY  12
 
 
 
 
 
 

Collaborative Colleagues:
Lawrence Rauchwerger: colleagues
David Padua: colleagues

Peer to Peer - Readers of this Article have also read: