ACM Home Page
Please provide us with feedback. Feedback
Entering the petaflop era: the architecture and performance of Roadrunner
Full text PdfPdf (448 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2008 ACM/IEEE conference on Supercomputing - Volume 00 table of contents
Austin, Texas
SECTION: Papers table of contents
Article No. 1  
Year of Publication: 2008
ISBN:978-1-4244-2835-9
Authors
Kevin J. Barker  Los Alamos National Laboratory, Los Alamos
Kei Davis  Los Alamos National Laboratory, Los Alamos
Adolfy Hoisie  Los Alamos National Laboratory, Los Alamos
Darren J. Kerbyson  Los Alamos National Laboratory, Los Alamos
Mike Lang  Los Alamos National Laboratory, Los Alamos
Scott Pakin  Los Alamos National Laboratory, Los Alamos
Jose C. Sancho  Los Alamos National Laboratory, Los Alamos
Publisher
IEEE Press  Piscataway, NJ, USA
Bibliometrics
Downloads (6 Weeks): 63,   Downloads (12 Months): 619,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
HPCS, High Productivity Computing Systems initiative in DARPA, Available from http://www.darpa.mil/ipto/programs/hpcs
 
2
 
3
4
 
5
6
 
7
 
8
Sriram Swaminarayan, Kai Kadau, and Timothy C. Germann. 350--450 Tflops Molecular Dynamics Simulations on the Roadrunner General-purpose Heterogeneous Supercomputer. ACM Gordon Bell Prize finalist, n proc. of the ACM/IEEE SC2008 Conference, Austin, Texas, November 2008.
 
9
Brian J. Albright, Benjamin K. Bergen, Lin Yin, Kevin J. Barker, and Darren J. Kerbyson. 0.365 Pflop/s Trillion-particle Particle-in-cell Modeling of Laser Plasma Interactions on Roadrunner. ACM Gordon Bell Prize finalist, in proc. of the ACM/IEEE SC2008 Conference, Austin, Texas, November 2008.
 
10
Jack J. Dongarra, Piotr Luszczek, and Antoine Petitet. The LINPACK benchmark: Past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803--820, August 2003.
 
11
International Business Machines Corporation. Accelerated Library Framework for Hybrid-x86: Programmer's Guide and API Reference. Technical document SC33-8406-00. IBM SDK for Multicore Acceleration version 3, release 0. 2007. Available from http://tinyurl.com/5cyfc9.
 
12
International Business Machines Corporation. C/C++ Language Extensions for Cell Broadband Architecture, Version 2.5. February 27, 2008. Available from http://tinyurl.com/5stuga.
 
13
International Business Machines Corporation. Data Communication and Synchronization Library for Hybrid-x86: Programmer's Guide and API Reference. Technical document SC33-8408-00. IBM SDK for Multicore Acceleration version 3, release 0. 2007. Available from http://tinyurl.com/6kn98k.
 
14
 
15
John A. Turner, Roadrunner Applications Team: Cell and Hybrid Results to Date. Los Alamos National Laboratory presentation. Available from http://www.lanl.gov/orgs/hpc/roadrunner/rrinfo/RR%20webPDFs/Turner_Apps_v6_LA -UR.pdf.
 
16
John McCalpin. "Memory bandwidth and machine balance in current high performance computers", in IEEE Comp. Soc. Tech. committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.
 
17
International Business Machines Corporation. Data Communication and Sychronization for Hybrid-x86 Programmer's Guide and API Reference, version 3.0. October 19, 2007.
 
18
Richard L. Graham, Galen M. Shipman, Brian W. Barrett, Ralph H. Castain, George Bosilca, and Andrew Lumsdaine. Open MPI: A high-performance, heterogeneous MPI. In Fifth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks (HeteroPar '06), pp. 1--9, Barcelona, Spain, September 2006.
 
19
 
20
F. Petrini, G. Fossum, J. Fernandez, A. L. Varbanescu, N. Kistler, M. Perrone, Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine, in proc. Int. Parallel and Distributed Processing Symposium, Long Beach, California, 2007.
 
21
 
22
Kevin Krewell. Cell moves into the limelight. Microprocessor Report, pp. 1--9, February 14, 2005.
 
23
Scott Pakin. Receiver-initiated Message Passing over RDMA Networks. In Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, Florida, April 2008. Available from http://www.c3.lanl.gov/PAL/publications/papers/Pakin2008:cellmsg.pdf.
 
24
 
25

CITED BY  6

Collaborative Colleagues:
Kevin J. Barker: colleagues
Kei Davis: colleagues
Adolfy Hoisie: colleagues
Darren J. Kerbyson: colleagues
Mike Lang: colleagues
Scott Pakin: colleagues
Jose C. Sancho: colleagues