ACM Home Page
Please provide us with feedback. Feedback
Communication optimizations for parallel computing using data access information
Full text HtmlHtml (87 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) table of contents
San Diego, California, United States
Article No. 69  
Year of Publication: 1995
ISBN:0-89791-816-9
Author
Martin C. Rinard  Department of Computer Science, University of California, Santa Barbara, Santa Barbara, California
Sponsors
IEEE-CS : Computer Society
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 16,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/224170.224413
What is a DOI?

ABSTRACT

Given the large communication overheads characteristic of modern parallel machines, optimizations that eliminate, hide or parallelize communication may improve the performance of parallel computations. This paper describes our experience automatically applying communication optimizations in the context of Jade, a portable, implicitly parallel programming language designed for exploiting task-level concurrency. Jade programmers start with a program written in a standard serial, imperative language, then use Jade constructs to declare how parts of the program access data. The Jade implementation uses this data access information to automatically extract the concurrency and apply communication optimizations. Jade implementations exist for both shared memory and message passing machines; each Jade implementation applies communication optimizations appropriate for the machine on which it runs. We present performance results for several Jade applications running on both a shared memory machine (the Stanford DASH machine) and a message passing machine (the Intel iPSC/860). We use these results to characterize the overall performance impact of the communication optimizations. For our application set replicating data for concurrent read access and improving the locality of the computation by placing tasks close to the data that they access are the most important optimizations. Broadcasting widely accessed data has a significant performance impact on one application; other optimizations such as concurrently fetching remote data and overlapping computation with communication have no effect.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
5
6
7
 
8
 
9
10
 
11
J. Harris, S. Lazaratos, and R. Michelena. Tomographic string inversion. In 60th Annual International Meeting, Society of Exploration and Geophysics, Extended Abstracts, pages 82--85, 1990.
 
12
13
 
14
 
15
16
 
17
18
 
19
 
20
 
21
 
22
D. Scales and M. S. Lam. The design and evaluation of a shared object system for distributed memory machines. In Proceedings of the First Usenix Symposium on Operating Systems Design and Implementation, November 1994.
23