|
ABSTRACT
This paper describes the Global Trees (GT) system that provides a multi-layered interface to a global address space view of distributed tree data structures, while providing scalable performance on distributed memory systems. The Global Trees system utilizes coarse-grained data movement to enhance locality and communication efficiency. We describe the design and implementation of GT, illustrate its use in the context of a gravitational simulation application, and provide experimental results that demonstrate the effectiveness of the approach. The key benefits of using this system include efficient shared-memory style programming of distributed trees, tree-specific optimizations for data access and computation, and the ability to customize many aspects of GT to optimize application performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Barnes and P. Hut, "A Hierarchical O(N log N) Force Calculation Algorithm," Nature, vol. 324, pp. 446--449, 1986.
|
| |
2
|
J. Carrier, L. Greengard, and V. Rokhlin, "A Fast Adaptive Multi-pole Algorithm for Particle Simulations," SIAM Journal of Scientific and Statistical Computing, vol. 9, no. 4, 1988, yale University Technical Report, YALEU/DCS/RR-496 (1986).
|
| |
3
|
R. J. Harrison, G. I. Fann, T. Yanai, and G. Beylkin, "Multiresolution Quantum Chemistry in Multiwavelet Bases," in International Conference on Computational Science, 2003, pp. 103--110.
|
| |
4
|
|
| |
5
|
|
| |
6
|
D. Callahan, B. Chamberlain, and H. Zima, "The Cascade High Productivity Language," High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings. Ninth International Workshop on, pp. 52--60, 26 April 2004.
|
 |
7
|
Philippe Charles , Christian Grothoff , Vijay Saraswat , Christopher Donawa , Allan Kielstra , Kemal Ebcioglu , Christoph von Praun , Vivek Sarkar, X10: an object-oriented approach to non-uniform cluster computing, Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, October 16-20, 2005, San Diego, CA, USA
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
MPI Forum, "MPI-2: Extensions to the Message-Passing Interface," Technical Report, University of Tennessee, Knoxville, 1996.
|
| |
12
|
R. Bariuso and A. Knies, "SHMEM User's Guide," 1994.
|
| |
13
|
UPC Consortium, "UPC Language Specifications, vl. 2," Lawrence Berkeley National Lab, Tech. Rep. LBNL-59208, 2005.
|
 |
14
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
15
|
|
| |
16
|
|
| |
17
|
Intel Corporation, "Cluster OpenMP User's Guide v9.1," no. 309096--002 US, 2005--2006.
|
| |
18
|
|
| |
19
|
Cristiana Amza , Alan L. Cox , Sandhya Dwarkadas , Pete Keleher , Honghui Lu , Ramakrishnan Rajamony , Weimin Yu , Willy Zwaenepoel, TreadMarks: Shared Memory Computing on Networks of Workstations, Computer, v.29 n.2, p.18-28, February 1996
[doi> 10.1109/2.485843]
|
 |
20
|
|
| |
21
|
Pete Keleher , Alan L. Cox , Sandhya Dwarkadas , Willy Zwaenepoel, TreadMarks: distributed shared memory on standard workstations and operating systems, Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, p.10-10, January 17-21, 1994, San Francisco, California
|
| |
22
|
|
| |
23
|
R. J. Stets , D. Chen , S. Dwarkadas , N. Hardavellas , G. C. Hunt , L. Kontothanassis , G. Magklis , S. Parthasarathy , U. Rencuzogullari , M. L. Scott, The Implementation of Cashmere, University of Rochester, Rochester, NY, 1999
|
| |
24
|
A. Singla and U. Ramachandran, "The Beehive Cluster System."
|
 |
25
|
|
 |
26
|
Ioannis Schoinas , Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , James R. Larus , David A. Wood, Fine-grain access control for distributed shared memory, ACM SIGPLAN Notices, v.29 n.11, p.297-306, Nov. 1994
|
 |
27
|
Liviu Iftode , Matthias Blumrich , Cezary Dubnicki , David L. Oppenheimer , Jaswinder Pal Singh , Kai Li, Shared virtual memory with automatic update support, Proceedings of the 13th international conference on Supercomputing, p.175-183, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305191]
|
| |
28
|
|
| |
29
|
K. L. Johnson, M. F. Kaashoek, and D. A. Wallach, "CRL: High-Performance All-Software Distributed Shared Memory," in Proc. of the Fifth Workshop on Scalable Shared Memory Multiprocessors, Jun. 1995.
|
| |
30
|
B. N. Bershad and M. J. Zekauskas, "Midway: Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors," School of Computer Science, Carnegie-Mellon University, Tech. Rep. CMU-CS-91-170, Sep. 1991.
|
| |
31
|
B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon, "The Midway Distributed Shared Memory System," in Proc. of the 38th IEEE Int'l Computer Conf. (COMPCON Spring '93), Feb. 1993, pp. 528--537.
|
| |
32
|
B. N. Bershad, "Practical Considerations for Non-Blocking Concurrent Objects," in Proc. of the 13th Int'l Conf. on Distributed Computing Systems (ICDCS-13), May 1993, pp. 264--273.
|
| |
33
|
|
 |
34
|
|
 |
35
|
|
 |
36
|
Laxmikant V. Kale , Sanjeev Krishnan, CHARM++: a portable concurrent object oriented system based on C++, Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications, p.91-108, September 26-October 01, 1993, Washington, D.C., United States
|
| |
37
|
C. Chang, A. Sussman, and J. Saltz, Parallel Programming Using C++, P. Lu and G. V. Wilson, Eds. Cambridge, MA, USA: MIT Press, 1996.
|
| |
38
|
H. E. Bal, M. F. Kaashoek, and A. S. Tanenbaum, "Experience with Distributed Programming in Orca," in Proc. of the 1990 Int'l Conf. on Computer Languages, Mar. 1990, pp. 79--89.
|
 |
39
|
|
 |
40
|
|
| |
41
|
K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken, "Titanium: A high-performance Java dialect," in ACM 1998 Workshop on Java for High-Performance Network Computing, ACM, Ed. New York, NY 10036, USA: ACM Press, 1998.
|
| |
42
|
|
| |
43
|
F. Baiardi, P. Mori, and L. Ricci, "Solving irregular problems through parallel irregular trees." in Parallel and Distributed Computing and Networks, 2005, pp. 246--251.
|
 |
44
|
|
| |
45
|
S. Parthasarathy, M. J. Zaki, and W. Li, "Memory placement techniques for parallel association mining," in In 4th Intl. Conf. Knowledge Discovery and Data Mining, 1998.
|
|