|
ABSTRACT
Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
HPCS, High Productivity Computing Systems initiative in DARPA, Available from http://www.darpa.mil/ipto/programs/hpcs
|
| |
2
|
Michael Gschwind , H. Peter Hofstee , Brian Flachs , Martin Hopkins , Yukio Watanabe , Takeshi Yamazaki, Synergistic Processing in Cell's Multicore Architecture, IEEE Micro, v.26 n.2, p.10-24, March 2006
[doi> 10.1109/MM.2006.41]
|
| |
3
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
 |
4
|
|
| |
5
|
|
 |
6
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, The potential of the cell processor for scientific computing, Proceedings of the 3rd conference on Computing frontiers, May 03-05, 2006, Ischia, Italy
[doi> 10.1145/1128022.1128027]
|
| |
7
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, Scientific computing Kernels on the cell processor, International Journal of Parallel Programming, v.35 n.3, p.263-298, June 2007
[doi> 10.1017/s10766-007-0034-5]
|
| |
8
|
Sriram Swaminarayan, Kai Kadau, and Timothy C. Germann. 350--450 Tflops Molecular Dynamics Simulations on the Roadrunner General-purpose Heterogeneous Supercomputer. ACM Gordon Bell Prize finalist, n proc. of the ACM/IEEE SC2008 Conference, Austin, Texas, November 2008.
|
| |
9
|
Brian J. Albright, Benjamin K. Bergen, Lin Yin, Kevin J. Barker, and Darren J. Kerbyson. 0.365 Pflop/s Trillion-particle Particle-in-cell Modeling of Laser Plasma Interactions on Roadrunner. ACM Gordon Bell Prize finalist, in proc. of the ACM/IEEE SC2008 Conference, Austin, Texas, November 2008.
|
| |
10
|
Jack J. Dongarra, Piotr Luszczek, and Antoine Petitet. The LINPACK benchmark: Past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803--820, August 2003.
|
| |
11
|
International Business Machines Corporation. Accelerated Library Framework for Hybrid-x86: Programmer's Guide and API Reference. Technical document SC33-8406-00. IBM SDK for Multicore Acceleration version 3, release 0. 2007. Available from http://tinyurl.com/5cyfc9.
|
| |
12
|
International Business Machines Corporation. C/C++ Language Extensions for Cell Broadband Architecture, Version 2.5. February 27, 2008. Available from http://tinyurl.com/5stuga.
|
| |
13
|
International Business Machines Corporation. Data Communication and Synchronization Library for Hybrid-x86: Programmer's Guide and API Reference. Technical document SC33-8408-00. IBM SDK for Multicore Acceleration version 3, release 0. 2007. Available from http://tinyurl.com/6kn98k.
|
| |
14
|
|
| |
15
|
John A. Turner, Roadrunner Applications Team: Cell and Hybrid Results to Date. Los Alamos National Laboratory presentation. Available from http://www.lanl.gov/orgs/hpc/roadrunner/rrinfo/RR%20webPDFs/Turner_Apps_v6_LA -UR.pdf.
|
| |
16
|
John McCalpin. "Memory bandwidth and machine balance in current high performance computers", in IEEE Comp. Soc. Tech. committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.
|
| |
17
|
International Business Machines Corporation. Data Communication and Sychronization for Hybrid-x86 Programmer's Guide and API Reference, version 3.0. October 19, 2007.
|
| |
18
|
Richard L. Graham, Galen M. Shipman, Brian W. Barrett, Ralph H. Castain, George Bosilca, and Andrew Lumsdaine. Open MPI: A high-performance, heterogeneous MPI. In Fifth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks (HeteroPar '06), pp. 1--9, Barcelona, Spain, September 2006.
|
| |
19
|
|
| |
20
|
F. Petrini, G. Fossum, J. Fernandez, A. L. Varbanescu, N. Kistler, M. Perrone, Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine, in proc. Int. Parallel and Distributed Processing Symposium, Long Beach, California, 2007.
|
| |
21
|
Alexandre E. Eichenberger , Kathryn O'Brien , Kevin O'Brien , Peng Wu , Tong Chen , Peter H. Oden , Daniel A. Prener , Janice C. Shepherd , Byoungro So , Zehra Sura , Amy Wang , Tao Zhang , Peng Zhao , Michael Gschwind, Optimizing Compiler for the CELL Processor, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, p.161-172, September 17-21, 2005
[doi> 10.1109/PACT.2005.33]
|
| |
22
|
Kevin Krewell. Cell moves into the limelight. Microprocessor Report, pp. 1--9, February 14, 2005.
|
| |
23
|
Scott Pakin. Receiver-initiated Message Passing over RDMA Networks. In Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, Florida, April 2008. Available from http://www.c3.lanl.gov/PAL/publications/papers/Pakin2008:cellmsg.pdf.
|
| |
24
|
|
| |
25
|
A. K. Nanda , J. R. Moulic , R. E. Hanson , G. Goldrian , M. N. Day , B. D. D'Arnora , S. Kesavarapu, Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers, IBM Journal of Research and Development, v.51 n.5, p.573-582, September 2007
|
CITED BY 6
|
|
Michael Kistler , John Gunnels , Daniel Brokenshire , Brad Benton, Petascale computing with accelerators, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
Tiankai Tu , Charles A. Rendleman , David W. Borhani , Ron O. Dror , Justin Gullingsrud , Morten Ø. Jensen , John L. Klepeis , Paul Maragakis , Patrick Miller , Kate A. Stafford , David E. Shaw, A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|