|
ABSTRACT
Partitioned global address space (PGAS) programming models have been identified as one of the few viable approaches for dealing with emerging many-core systems. These models tend to generate many small messages, which requires specific support from the network interface hardware to enable efficient execution. In the past, Cray included E-registers on the Cray T3E to support the SHMEM API; however, with the advent of multi-core processors, the balance of computation to communication capabilities has shifted toward computation. This paper explores the message rates that are achievable with multi-core processors and simplified PGAS support on a more conventional network interface. For message rate tests, we find that simple network interface hardware is more than sufficient. We also find that even typical data distributions, such as cyclic or block-cyclic, do not need specialized hardware support. Finally, we assess the impact of such support on the well known RandomAccess benchmark.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Ed Anderson , Jeff Brooks , Charles Grassl , Steve Scott, Performance of the CRAY T3E multiprocessor, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-17, November 15-21, 1997, San Jose, CA
[doi> 10.1145/509593.509632]
|
 |
2
|
M. A. Blumrich , K. Li , R. Alpert , C. Dubnicki , E. W. Felten , J. Sandberg, Virtual memory mapped network interface for the SHRIMP multicomputer, Proceedings of the 21st annual international symposium on Computer architecture, p.142-153, April 18-21, 1994, Chicago, Illinois, United States
|
| |
3
|
Nanette J. Boden , Danny Cohen , Robert E. Felderman , Alan E. Kulawik , Charles L. Seitz , Jakov N. Seizovic , Wen-King Su, Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, v.15 n.1, p.29-36, February 1995
[doi> 10.1109/40.342015]
|
| |
4
|
|
| |
5
|
R. Brightwell, D. Doerfler, and K. D. Underwood. A preliminary analysis of the InfiniPath and XDI network interfaces. In 20th International Parallel and Distributed Processing Symposium (IPDPS '06) Workshop on Communication Architectures for Clusters, April 2006.
|
| |
6
|
D. Burger and T. Austin. The SimpleScalar Tool Set. Version 2.0. SimpleScalar LLC.
|
| |
7
|
D. Callahan, B. L. Chamberlain, and H. P. Zima. The Cascade high productivity language. In Ninth IEEE International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2004), pages 52--60, April 2004.
|
| |
8
|
J. Carbonaro and F. Verhoorn. Cavallino: The Teraflops router and NIC. In Fourth IEEE Symposium on High-Performance Interconnects (Hotl '96), August 1996.
|
| |
9
|
W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. Technical Report CCS-TR-99-157, May 1999.
|
 |
10
|
Philippe Charles , Christian Grothoff , Vijay Saraswat , Christopher Donawa , Allan Kielstra , Kemal Ebcioglu , Christoph von Praun , Vivek Sarkar, X10: an object-oriented approach to non-uniform cluster computing, Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, October 16-20, 2005, San Diego, CA, USA
|
| |
11
|
Cray, Inc. Cray XIE supercomputer. http://www.cray.com/products/systems/xi.
|
| |
12
|
Cray Research, Inc. SHMEM Technical Note for C, SG-2516 2.3, October 1994.
|
| |
13
|
|
 |
14
|
|
| |
15
|
H. Hellwagner and A. Reinefeld, editors.SCI: Scalable Coherent Interface: Architecture andxo Software for High-Performance Compute Clusters, volume 1734 of Lecture Notes in Computer Science. Springer, 1999.
|
| |
16
|
Infiniband Trade Association. http://www.infinibandta.org, 1999.
|
| |
17
|
S. M. Kelly and R. Brightwell. Software architecture of the light weight kernel, Catamount. In Proceedings of the 2005 Cray User Group Annual Technical Conference, May 2005.
|
| |
18
|
J. Liu and D. K. Panda. Implementing efficient and scalable flow control schemes in MPI over InfiniBand. In 2004 Workshop on Communication Architecture for Clusters (CAC '04), April 2004.
|
| |
19
|
P. Luszczek, J. Dongarra, D. Koester, R. Rabenseifner, R. Lucas, J. Kepner, J. McCalpin, D. Bailey, and D. Takahashi. Introduction to the HPC challenge benchmark suite, March 2005. http://icl.cs.utk.edu/hpcc/pubs/index.html.
|
| |
20
|
D. Mayhew and V. Krishnan. PCI Express and Advanced Switching: Evolutionary path to building next generation interconnects. In Eleventh IEEE Symposium on High-Performance Interconnects (Hotl '04), August 2004.
|
| |
21
|
Mellanox, Inc. New Mellanox ConnectX IB adapters unleash multi-core processor performance, http://www.mellanox.com/news/press_releases/pr_032607.php.
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
S. Plimpton, R. Brightwell, C. Vaughan, K. Underwood, and M. Davis. A simple synchronous distributed-memory algorithm for the HPCC RandomAccess benchmark. In 2006 IEEE International Conference on Cluster Computing, September 2006.
|
| |
27
|
QLogic, Inc. InfiniPath interconnect performance. http://www.pathscale.com/infinipath-perf.html.
|
| |
28
|
Quadrics, Inc. QSNet-II performance results. http://www.quadrics.com/.
|
| |
29
|
|
 |
30
|
|
 |
31
|
|
| |
32
|
K. Underwood. Challenges and issues in benchmarking MPI. In B. Mohr, J. L. Träff, J. Worringen, and J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface: 13th European PVM/MPI Users' Group Meeting, Bonn, Germany, September 2006 Proceedings, volume 4192 of Lecture Notes in Computer Science, pages 339--346. Springer-Verlag, 2006.
|
| |
33
|
K. D. Underwood, M. Levenhagen, and A. Rodrigues. Simulating Red Storm: Challenges and successes in building a system simulation. In 21st International Parallel and Distributed Processing Symposium (IPDPS '07), March 2007.
|
 |
34
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
|