|
ABSTRACT
Customizing the protocols that manage accesses to different data structures within an application can improve the performance of software shared-memory programs substantially. Existing systems for using customizable protocols are hard to use directly because the mechanisms they provide for manipulating protocols are low-level ones. This article is an in-depth study of the issues involved in providing language support for application-specific protocols. We describe the design and implementation of a new language for parallel programming, Ace, that integrates support for customizable protocols with minimal extensions to C. Ace applications are developed using a shared-memory model with a default sequentially consistent protocol. Performance can then be optimized, with minor modifications to the application, by experimenting with different protocol libraries. The design of Ace was driven by a detailed study of the use of customizable protocols. We delineate the issues that arise when programming with customizable protocols and present novel abstractions that allow for their easy use. We describe the design and implementation of a runtime system and compiler for Ace nd discuss compiler optimizations that improve the performance of such software shared-memory systems. We study the communication patterns of a set of benchmark applications and consider the use of customizable protocols to optimize their performance. We evaluate the performance of our system through experiments on a Thinking Machine CM-5 and a Cray T3E. We also present measurements that demonstrate that Ace has good performance compared to that of a modern distributed shared-memory system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ADVE, S. V. AND GHARACHORLOO, K. 1995. Shared memory consistency models: A tutorial. Tech. Rep. ECE-9512. Rice University, Houston, TX.
|
| |
2
|
AGARWAL, A., CHAIKEN, D., JOHNSON, K., KRANZ, D., KUBIATOWICZ, J., KURIHARA, K., LIM, B.-H., MAA, G., AND NUSSBAUM, D. 1992. The MIT Alewife machine: A large-scale distributedmemory multiprocessor. In Scalable Shared Memory Multiprocessors, M. Dubois and S. S. Thakkar, Eds. Kluwer Academic Publishers, Hingham, MA, 239-261.
|
| |
3
|
|
| |
4
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
| |
5
|
Cristiana Amza , Alan L. Cox , Sandhya Dwarkadas , Pete Keleher , Honghui Lu , Ramakrishnan Rajamony , Weimin Yu , Willy Zwaenepoel, TreadMarks: Shared Memory Computing on Networks of Workstations, Computer, v.29 n.2, p.18-28, February 1996
[doi> 10.1109/2.485843]
|
| |
6
|
BARNES, J. AND HUT, P. 1986. A hierarchical O(NlogN) force calculation algorithm. Nature 324, 4, 446-449.
|
 |
7
|
|
| |
8
|
BERSHAD, B., ZEKAUSKAS, M., AND SAWDON, W.A. 1993. The Midway distributed shared memory system. In Proceedings of the IEEE International Computer Conference (COMP- CON '93, Feb.). IEEE Computer Society Press, Los Alamitos, CA, 528-537.
|
 |
9
|
|
 |
10
|
|
| |
11
|
Soumen Chakrabarti , Etienne Deprit , Eun Im , Jeff Jones , Arvind Krishnamurthy , Chi Wen , Katherine Yelick, Multipol: A Distributed Data Structure Library, University of California at Berkeley, Berkeley, CA, 1995
|
 |
12
|
Rohit Chandra , Kourosh Gharachorloo , Vijayaraghavan Soundararajan , Anoop Gupta, Performance evaluation of hybrid hardware and software distributed shared memory protocols, Proceedings of the 8th international conference on Supercomputing, p.274-288, July 11-15, 1994, Manchester, England
[doi> 10.1145/181181.181543]
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
DWARKADAS, S., Cox, A. L., AND ZWAENEPOEL, W. 1996. An integrated compile-time/run-time software distributed shared memory system. ACM SIGOPS Oper. Syst. Rev. 30, 5, 186-197.
|
 |
18
|
Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , Ioannis Schoinas , Mark D. Hill , James R. Larus , Anne Rogers , David A. Wood, Application-specific protocols for user-level shared memory, Proceedings of the 1994 ACM/IEEE conference on Supercomputing, November 14-18, 1994, Washington, D.C.
[doi> 10.1145/602770.602838]
|
| |
19
|
Al Geist , Adam Beguelin , Jack Dongarra , Weicheng Jiang , Robert Manchek , Vaidy Sunderam, PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing, MIT Press, Cambridge, MA, 1995
|
 |
20
|
|
| |
21
|
HIGH PERFORMANCE FORTRAN FORUM. 1992. High Performance Fortran langauge specification. Tech. Rep. CRPC-TR9225. Center for Research on Parallel Computation, Rice University, Houston, TX.
|
| |
22
|
JOHNSON, K., ADLER, J., AND GUPTA, S. 1995a. CRL 1.0 software distribution. (Software). Available via http://www.pdos.lcs.mit.edu/crl.
|
 |
23
|
K. L. Johnson , M. F. Kaashoek , D. A. Wallach, CRL: high-performance all-software distributed shared memory, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.213-226, December 03-06, 1995, Copper Mountain, Colorado, United States
|
| |
24
|
|
 |
25
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
 |
26
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, ACM SIGARCH Computer Architecture News, v.22 n.2, p.302-313, April 1994
|
 |
27
|
|
| |
28
|
|
 |
29
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proceedings of the 17th annual international symposium on Computer Architecture, p.148-159, May 28-31, 1990, Seattle, Washington, United States
|
 |
30
|
|
 |
31
|
S. Lumetta , L. Murphy , X. Li , D. Culler , I. Khalil, Decentralized optimal power pricing: the development of a parallel program, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.240-249, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169718]
|
 |
32
|
|
| |
33
|
MIDKIFF, S. AND PADUA, D. 1990. Issues in the optimization of parallel programs. In Proceedings of the International Conference on Parallel Processing. 105-113.
|
 |
34
|
Shubhendu S. Mukherjee , Shamik D. Sharma , Mark D. Hill , James R. Larus , Anne Rogers , Joel Saltz, Efficient support for irregular applications on distributed-memory machines, ACM SIGPLAN Notices, v.30 n.8, p.68-79, Aug. 1995
|
| |
35
|
|
| |
36
|
|
 |
37
|
|
 |
38
|
S. K. Reinhardt , J. R. Larus , D. A. Wood, Tempest and typhoon: user-level shared memory, Proceedings of the 21ST annual international symposium on Computer architecture, p.325-336, April 18-21, 1994, Chicago, Illinois, United States
|
 |
39
|
|
| |
40
|
|
 |
41
|
Edward Rothberg , Jaswinder Pal Singh , Anoop Gupta, Working sets, cache sizes, and node granularity issues for large-scale multiprocessors, Proceedings of the 20th annual international symposium on Computer architecture, p.14-26, May 16-19, 1993, San Diego, California, United States
|
| |
42
|
SCALES, D. J. AND LAM, M. S. 1994. The design and evaluation of a shared object system for distributed memory machines. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation (OSDI '94, Monterey, CA, Nov.). USENIX Assoc., Berkeley, CA, 101-114.
|
 |
43
|
|
 |
44
|
Ioannis Schoinas , Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , James R. Larus , David A. Wood, Fine-grain access control for distributed shared memory, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.297-306, October 05-07, 1994, San Jose, California, United States
|
| |
45
|
SCOTT, S. L. 1996. Synchronization and communication in the T3E multiprocessor. ACM SIGOPS Oper. Syst. Rev. 30, 5, 26-36.
|
 |
46
|
|
| |
47
|
|
| |
48
|
Vo, K.-P. 1996. Vmalloc: A general and efficient memory allocator. Softw. Pract. Exper. 26, 3 (Mar.), 357-374.
|
 |
49
|
|
| |
50
|
Robert Wilson , Robert French , Christopher Wilson , Saman Amarasinghe , Jennifer Anderson , Steve Tjiang , Shih Liao , Chau Tseng , Mary Hall , Monica Lam , John Hennessy, The SUIF Compiler System: a Parallelizing and Optimizing Research Compiler, Stanford University, Stanford, CA, 1994
|
REVIEW
"Noah S. Prywes : Reviewer"
Ace, a new parallel programming language, is more efficient than
existing languages for parallel processing with message passing and
clustered machines. This topic may sound prosaic, but it is explained
and demonstrated well, making the paper
more...
|