|
ABSTRACT
Computational scientists have seen a frustrating trend of stagnating application performance despite dramatic increases in the claimed peak capability of high performance computing systems. This trend has been widely attributed to the use of superscalar-based commodity components whoýs architectural designs offer a balance between memory performance, network capability, and execution rate that is poorly matched to the requirements of large-scale numerical computations. Recently, two innovative parallel-vector architectures have become operational: the Japanese Earth Simulator (ES) and the Cray X1. In order to quantify what these modern vector capabilities entail for the scientists that rely on modeling and simulation, it is critical to evaluate this architectural paradigm in the context of demanding computational algorithms. Our evaluation study examines four diverse scientific applications with the potential to run at ultrascale, from the areas of plasma physics, material science, astrophysics, and magnetic fusion. We compare performance between the vector-based ES and X1, with leading superscalar-based platforms: the IBM Power3/4 and the SGI Altix. Our research team was the first international group to conduct a performance evaluation study at the Earth Simulator Center; remote ES access in not available. Results demonstrate that the vector systems achieve excellent performance on our application suite - the highest of any architecture tested to date. However, vectorization of a particle-in-cell code highlights the potential difficulty of expressing irregularly structured algorithms as data-parallel programs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] Amira - Advanced 3D Visualization and Volume Modeling. http://www.amiravis.com.
|
| |
2
|
[2] Cactus Code Server. http://www.cactuscode.org.
|
| |
3
|
[3] Co-Array Fortran. http://www.co-array.org.
|
| |
4
|
[4] ORNL Cray X1 Evaluation. http://www.csm.ornl.gov/~dunigan/cray.
|
| |
5
|
[5] PARAllel Total Energy Code. http://www.nersc.gov/projects/paratec.
|
| |
6
|
[6] Top500 Supercomputer Sites. http://www.top500.org.
|
| |
7
|
[7] P. A. Agarwal et al. Cray X1 evaluation status report. In Proc. of the 46th Cray Users Group Conference, May 17-21, 2004.
|
| |
8
|
[8] M. Alcubierre, G. Allen, B. Brgmann, E. Seidel, and W.-M. Suen. Towards an understanding of the stability properties of the 3+1 evolution equations in general relativity. Phys. Rev. D, (gr-qc/9908079), 2000.
|
| |
9
|
[9] P. J. Dellar. Lattice kinetic schemes for magnetohydrodynamics. J. Comput. Phys., 79, 2002.
|
| |
10
|
|
| |
11
|
[11] J. A. Font, M. Miller, W. M. Suen, and M. Tobias. Three dimensional numerical general relativistic hydrodynamics: Formulations, methods, and code tests. Phys. Rev. D, Phys. Rev. D61, 2000.
|
| |
12
|
[12] G. Griem, L. Oliker, J. Shalf, and K. Yelick. Identifying performance bottlenecks on modern microarchitectures using an adaptable probe. In Proc. 3rd International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS), Santa Fe, New Mexico, Apr. 26-30, 2004.
|
| |
13
|
|
| |
14
|
[14] Z. Lin, S. Ethier, T.S. Hahm, and W.M. Tang. Size scaling of turbulent transport in magnetically confined plasmas. Phys. Rev. Lett., 88, 2002.
|
| |
15
|
[15] Z. Lin, T. S. Hahm, W. W. Lee, W. M. Tang, and R. B. White. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science, Sep 1998.
|
| |
16
|
[16] A. Macnab, G. Vahala, P. Pavlo, L. Vahala, and M. Soe. Lattice boltzmann model for dissipative incompressible MHD. In Proc. 28th EPS Conference on Controlled Fusion and Plasma Physics, volume 25A, Funchal, Portugal, June 18-22, 2001.
|
| |
17
|
[17] A. Macnab, G. Vahala, L. Vahala, and P. Pavlo. Lattice boltzmann model for dissipative MHD. In Proc. 29th EPS Conference on Controlled Fusion and Plasma Physics, volume 26B, Montreux, Switzerland, June 17-21, 2002.
|
| |
18
|
[18] K. Nakajima. Three-level hybrid vs. flat mpi on the earth simulator: Parallel iterative solvers for finite-element method. In Proc. 6th IMACS Symposium Iterative Methods in Scientific Computing, volume 6, Denver, Colorado, March 27-30, 2003.
|
| |
19
|
[19] A. Nishiguchi, S. Orii, and T. Yabe. Vector calculation of particle code. J. Comput. Phys., 61, 1985.
|
| |
20
|
[20] L. Oliker, R. Biswas, J. Borrill, A. Canning, J. Carter, J. Djomehri, H. Shan, and D. Skinner. A performance evaluation of the Cray X1 for scientific applications. In VECPAR: 6th International Meeting on High Performance Computing for Computational Science, Valencia, Spain, June 28-30, 2004.
|
| |
21
|
Leonid Oliker , Andrew Canning , Jonathan Carter , John Shalf , David Skinner , Ethier Ethier , Rupak Biswas , Jahed Djomehri , Rob Van der Wijngaart, Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.38, November 15-21, 2003
|
| |
22
|
[22] S. Succi. The lattice boltzmann equation for fluids and beyond. Oxford Science Publ., 2001.
|
| |
23
|
[23] H. Uehara, M. Tamura, and M. Yokokawa. MPI performance measurement on the Earth Simulator. Technical Report # 15, NEC Research and Development, 2003/1.
|
| |
24
|
[24] Y-G Yoon, B.G. Pfrommer, S.G. Louie, and A. Canning. NMR chemical shifts in amino acids: effects of environments, electric field and amine group rotation. Solid State Communications, 131, 2004.
|
CITED BY 12
|
|
Ilya Sharapov , Robert Kroeger , Guy Delamarter , Razvan Cheveresan , Matthew Ramsay, A case study in top-down performance estimation for a large-scale parallel application, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonid Oliker , Jonathan Carter , Michael Wehner , Andrew Canning , Stephane Ethier , Art Mirin , David Parks , Patrick Worley , Shigemune Kitawaki , Yoshinori Tsuda, Leading Computational Methods on Scalar and Vector HEC Platforms, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p.62, November 12-18, 2005
|
|
|
|
|
|
Akihiro Musa , Yoshiei Sato , Ryusuke Egawa , Hiroyuki Takizawa , Koki Okabe , Hiroaki Kobayashi, An on-chip cache design for vector processors, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.17-23, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
|
|
|
Akihiro Musa , Yoshiei Sato , Takashi Soga , Koki Okabe , Ryusuke Egawa , Hiroyuki Takizawa , Hiroaki Kobayashi, A shared cache for a chip multi vector processor, Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture, p.24-29, October 26-26, 2008, Toronto, Canada
|
|