|
ABSTRACT
This paper describes the design and implementation of a scalable run-time system and an optimizing compiler for Unified Parallel C (UPC). An experimental evaluation on BlueGene/L®, a distributed-memory machine, demonstrates that the combination of the compiler with the runtime system produces programs with performance comparable to that of efficient MPI programs and good performance scalability up to hundreds of thousands of processors.Our runtime system design solves the problem of maintaining shared object consistency efficiently in a distributed memory machine. Our compiler infrastructure simplifies the code generated for parallel loops in UPC through the elimination of affinity tests, eliminates several levels of indirection for accesses to segments of shared arrays that the compiler can prove to be local, and implements remote update operations through a lower-cost asynchronous message. The performance evaluation uses three well-known benchmarks --- HPC RandomAccess, HPC STREAM and NAS CG --- to obtain scaling and absolute performance numbers for these benchmarks on up to 131072 processors, the full BlueGene/L machine. These results were used to win the HPC Challenge Competition at SC05 in Seattle WA, demonstrating that PGAS languages support both productivity and performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G. Almasi, C. Archer, J. G. Castaos, J. A. Gunnels, C. C. Erway, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. D. Steinmacher-Burow, W. Gropp, and B. Toonen. Design and implementation of message-passing service for the BlueGene/L supercomputer. IBM Journal of Research and Development, 49(2/3):393--406, 2005.
|
| |
2
|
G. Almasi, L. D. Rose, B. B. Fraguela, J. Moreira, and D. A. Padua. Programming for locality and parallelism with hierarchically tiled arrays. In Workshop on Languages and Compilers for Parallel Computing (LCPC), volume 2958 of Lecture Notes in Computer Science, pages 162--176, College Station, TX, October 2003. Springer.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. Technical Report CCS-TR-99-157, George Washington University, 1999. ftp://ftp.seas.gwu.edu/pub/upc/downloads/upctr.pdf.
|
 |
8
|
Soumen Chakrabarti , Manish Gupta , Jong-Deok Choi, Global communication analysis and optimization, Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, p.68-78, May 21-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
9
|
W.-Y. Chen. Building a source-to-source UPC-to-C translator. Master's thesis, University of California at Berkeley, Berkeley, CA, 2005.
|
| |
10
|
|
 |
11
|
Cristian Coarfa , Yuri Dotsenko , John Mellor-Crummey , François Cantonnet , Tarek El-Ghazawi , Ashrujit Mohanti , Yiyi Yao , Daniel Chavarría-Miranda, An evaluation of global address space languages: co-array fortran and unified parallel C, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
[doi> 10.1145/1065944.1065950]
|
| |
12
|
Cray UPC home page. http://docs.cray.com/books/S-2179-50/html-S-2179-50/z1035483822pvl.html.
|
| |
13
|
DARPA High Productivity Computing Systems. http://www.darpa.mil/ipto/programs/hpcs.
|
| |
14
|
|
| |
15
|
T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specifications, v1.1.1 edition, October 2003.
|
| |
16
|
A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-burow, T. Takken, and P. Vranas. Overview of the BlueGene/L system architecture. IBM Journal of Research and Development, 49(2/3):195--212, 2005.
|
| |
17
|
GCC UPC home page. http://www.intrepid.com/upc/.
|
 |
18
|
Manish Gupta , Sam Midkiff , Edith Schonberg , Ven Seshadri , David Shields , Ko-Yang Wang , Wai-Mee Ching , Ton Ngo, An HPF compiler for the IBM SP2, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.71-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224422]
|
| |
19
|
HPC challenge award competition. http://www.hpcchallenge.org.
|
| |
20
|
HP/Compaq UPC. http://h30097.www3.hp.com/upc/index.htm.
|
 |
21
|
|
| |
22
|
|
| |
23
|
M. Mendell and R. Archambault. IBM's BlueGene/L compiler implementation. In BlueGene/L: Applications, Architecture and Software Workshop, Sparks, NV, Oct 2003. http://www.llnl.gov/asci/platforms/bluegene/papers/10mendell.pdf.
|
| |
24
|
|
 |
25
|
|
| |
26
|
J. Savant and S. Seidel. MuPC: A run time system for unified parallel C. Technical Report CS-TR-02-03, Department of Computer Science, Michigan Technological University, 2002.
|
| |
27
|
|
 |
28
|
Ernesto Su , Antonio Lain , Shankar Ramaswamy , Daniel J. Palermo , Eugene W. Hodges, IV , Prithviraj Banerjee, Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers, Proceedings of the 9th international conference on Supercomputing, p.424-433, July 03-07, 1995, Barcelona, Spain
[doi> 10.1145/224538.224650]
|
| |
29
|
Top500 supercomputer sites. www.top500.org.
|
| |
30
|
IBM XL UPC compiler. http://www.alphaworks.ibm.com/tech/upccompiler.
|
| |
31
|
K. Yelick. Partitioned Global Address Space Languages: Titanium and UPC experience. Presentation at IBM TJ Watson Research Center, Nov. 2005.
|
| |
32
|
K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. In ACM Workshop on Java for High-Performance Network Computing, New York, NY 10036, USA, 1998.
|
 |
33
|
|
CITED BY 5
|
|
Sameer Kumar , Gabor Dozsa , Gheorghe Almasi , Philip Heidelberger , Dong Chen , Mark E. Giampapa , Michael Blocksome , Ahmad Faraj , Jeff Parker , Joseph Ratterman , Brian Smith , Charles J. Archer, The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer, Proceedings of the 22nd annual international conference on Supercomputing, June 07-12, 2008, Island of Kos, Greece
|
|
|
Ganesh Bikshandi , Jose G. Castanos , Sreedhar B. Kodali , V. Krishna Nandivada , Igor Peshansky , Vijay A. Saraswat , Sayantan Sur , Pradeep Varma , Tong Wen, Efficient, portable implementation of asynchronous multi-place programs, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
Satish Chandra , Vijay Saraswat , Vivek Sarkar , Rastislav Bodik, Type inference for locality analysis of distributed data structures, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
|
|
|
Jia Guo , Ganesh Bikshandi , Basilio B. Fraguela , Maria J. Garzaran , David Padua, Programming with tiles, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|