| Communication optimizations for parallel computing using data access information |
| Full text |
Html
(87 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM)
table of contents
San Diego, California, United States
Article No. 69
Year of Publication: 1995
ISBN:0-89791-816-9
|
|
Author
|
|
Martin C. Rinard
|
Department of Computer Science, University of California, Santa Barbara, Santa Barbara, California
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 16, Citation Count: 3
|
|
|
ABSTRACT
Given the large communication overheads characteristic of modern parallel machines, optimizations that eliminate, hide or parallelize communication may improve the performance of parallel computations. This paper describes our experience automatically applying communication optimizations in the context of Jade, a portable, implicitly parallel programming language designed for exploiting task-level concurrency. Jade programmers start with a program written in a standard serial, imperative language, then use Jade constructs to declare how parts of the program access data. The Jade implementation uses this data access information to automatically extract the concurrency and apply communication optimizations. Jade implementations exist for both shared memory and message passing machines; each Jade implementation applies communication optimizations appropriate for the machine on which it runs. We present performance results for several Jade applications running on both a shared memory machine (the Stanford DASH machine) and a message passing machine (the Intel iPSC/860). We use these results to characterize the overall performance impact of the communication optimizations. For our application set replicating data for concurrent read access and improving the locality of the computation by placing tasks close to the data that they access are the most important optimizations. Broadcasting widely accessed data has a significant performance impact on one application; other optimizations such as concurrently fetching remote data and overlapping computation with communication have no effect.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
John B. Carter , John K. Bennett , Willy Zwaenepoel, Implementation and performance of Munin, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.152-164, October 13-16, 1991, Pacific Grove, California, United States
|
 |
6
|
Rohit Chandra , Anoop Gupta , John L. Hennessy, Data locality and load balancing in COOL, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.249-259, May 19-22, 1993, San Diego, California, United States
|
 |
7
|
|
| |
8
|
Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , Ioannis Schoinas , Mark D. Hill , James R. Larus , Anne Rogers , David A. Wood, Application-specific protocols for user-level shared memory, Proceedings of the 1994 conference on Supercomputing, p.380-389, December 1994, Washington, D.C., United States
|
| |
9
|
|
 |
10
|
|
| |
11
|
J. Harris, S. Lazaratos, and R. Michelena. Tomographic string inversion. In 60th Annual International Meeting, Society of Exploration and Geophysics, Extended Abstracts, pages 82--85, 1990.
|
| |
12
|
|
 |
13
|
Wilson C. Hsieh , Paul Wang , William E. Weihl, Computation migration: enhancing locality for distributed-memory parallel systems, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.239-248, May 19-22, 1993, San Diego, California, United States
|
| |
14
|
|
| |
15
|
|
 |
16
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
17
|
|
 |
18
|
|
| |
19
|
M. C. Rinard , D. J. Scales , M. S. Lam, Heterogeneous parallel programming in Jade, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, p.245-256, November 16-20, 1992, Minneapolis, Minnesota, United States
|
| |
20
|
|
| |
21
|
|
| |
22
|
D. Scales and M. S. Lam. The design and evaluation of a shared object system for distributed memory machines. In Proceedings of the First Usenix Symposium on Operating Systems Design and Implementation, November 1994.
|
 |
23
|
|
|