ACM Home Page
Please provide us with feedback. Feedback
Shared memory programming for large scale machines
Full text PdfPdf (245 KB)
Source Conference on Programming Language Design and Implementation archive
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation table of contents
Ottawa, Ontario, Canada
SESSION: Parallelism table of contents
Pages: 108 - 117  
Year of Publication: 2006
ISBN:1-59593-320-4
Also published in ...
Authors
Christopher Barton  University of Alberta, Edmonton, Canada
CĆlin Casçaval  IBM T.J.Watson Research Center, Yorktown Heights, NY
George Almási  IBM T.J.Watson Research Center, Yorktown Heights, NY
Yili Zheng  Purdue University, West Lafayette IN
Montse Farreras  Universitat Politecnica de Catalunya, Barcelona Spain
Siddhartha Chatterje  IBM T.J.Watson Research Center, Yorktown Heights, NY
José Nelson Amaral  University of Alberta, Edmonton, Canada
Sponsors
ACM: Association for Computing Machinery
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 101,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1133981.1133995
What is a DOI?

ABSTRACT

This paper describes the design and implementation of a scalable run-time system and an optimizing compiler for Unified Parallel C (UPC). An experimental evaluation on BlueGene/L®, a distributed-memory machine, demonstrates that the combination of the compiler with the runtime system produces programs with performance comparable to that of efficient MPI programs and good performance scalability up to hundreds of thousands of processors.Our runtime system design solves the problem of maintaining shared object consistency efficiently in a distributed memory machine. Our compiler infrastructure simplifies the code generated for parallel loops in UPC through the elimination of affinity tests, eliminates several levels of indirection for accesses to segments of shared arrays that the compiler can prove to be local, and implements remote update operations through a lower-cost asynchronous message. The performance evaluation uses three well-known benchmarks --- HPC RandomAccess, HPC STREAM and NAS CG --- to obtain scaling and absolute performance numbers for these benchmarks on up to 131072 processors, the full BlueGene/L machine. These results were used to win the HPC Challenge Competition at SC05 in Seattle WA, demonstrating that PGAS languages support both productivity and performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
G. Almasi, C. Archer, J. G. Castaos, J. A. Gunnels, C. C. Erway, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. D. Steinmacher-Burow, W. Gropp, and B. Toonen. Design and implementation of message-passing service for the BlueGene/L supercomputer. IBM Journal of Research and Development, 49(2/3):393--406, 2005.
 
2
G. Almasi, L. D. Rose, B. B. Fraguela, J. Moreira, and D. A. Padua. Programming for locality and parallelism with hierarchically tiled arrays. In Workshop on Languages and Compilers for Parallel Computing (LCPC), volume 2958 of Lecture Notes in Computer Science, pages 162--176, College Station, TX, October 2003. Springer.
3
 
4
 
5
 
6
 
7
W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. Technical Report CCS-TR-99-157, George Washington University, 1999. ftp://ftp.seas.gwu.edu/pub/upc/downloads/upctr.pdf.
8
 
9
W.-Y. Chen. Building a source-to-source UPC-to-C translator. Master's thesis, University of California at Berkeley, Berkeley, CA, 2005.
 
10
11
 
12
Cray UPC home page. http://docs.cray.com/books/S-2179-50/html-S-2179-50/z1035483822pvl.html.
 
13
DARPA High Productivity Computing Systems. http://www.darpa.mil/ipto/programs/hpcs.
 
14
 
15
T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specifications, v1.1.1 edition, October 2003.
 
16
A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-burow, T. Takken, and P. Vranas. Overview of the BlueGene/L system architecture. IBM Journal of Research and Development, 49(2/3):195--212, 2005.
 
17
GCC UPC home page. http://www.intrepid.com/upc/.
18
 
19
HPC challenge award competition. http://www.hpcchallenge.org.
 
20
HP/Compaq UPC. http://h30097.www3.hp.com/upc/index.htm.
21
 
22
 
23
M. Mendell and R. Archambault. IBM's BlueGene/L compiler implementation. In BlueGene/L: Applications, Architecture and Software Workshop, Sparks, NV, Oct 2003. http://www.llnl.gov/asci/platforms/bluegene/papers/10mendell.pdf.
 
24
25
 
26
J. Savant and S. Seidel. MuPC: A run time system for unified parallel C. Technical Report CS-TR-02-03, Department of Computer Science, Michigan Technological University, 2002.
 
27
28
 
29
Top500 supercomputer sites. www.top500.org.
 
30
IBM XL UPC compiler. http://www.alphaworks.ibm.com/tech/upccompiler.
 
31
K. Yelick. Partitioned Global Address Space Languages: Titanium and UPC experience. Presentation at IBM TJ Watson Research Center, Nov. 2005.
 
32
K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. In ACM Workshop on Java for High-Performance Network Computing, New York, NY 10036, USA, 1998.
33

CITED BY  5

Collaborative Colleagues:
Christopher Barton: colleagues
CĆlin Casçaval: colleagues
George Almási: colleagues
Yili Zheng: colleagues
Montse Farreras: colleagues
Siddhartha Chatterje: colleagues
José Nelson Amaral: colleagues