ACM Home Page
Please provide us with feedback. Feedback
Performance evaluation and tuning of GRAPE-6 - towards 40 "real" Tflops
Full text PdfPdf (370 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2003 ACM/IEEE conference on Supercomputing table of contents
Page: 2  
Year of Publication: 2003
ISBN:1-58113-695-1
Authors
Junichiro Makino  University of Tokyo, Japan
Eiichiro Kokubo  National Astronomical Observatory of Japan, Tokyo
Toshiyuki Fukushige  University of Tokyo, Japan
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 25,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

In this paper, we describe the performance characteristics of GRAPE-6, the sixth-generation special-purpose computer for gravitational many-body problems. GRAPE-6 consists of 2048 custom pipeline chips, each of which integrates six pipeline processors specialized for the calculation of gravitational interaction between particles. The GRAPE hardware performs the evaluation of the interaction. The frontend processors perform all other operations, such as the time integration of the orbits of particles, I/O, on-the-fly analysis etc. The theoretical peak speed of GRAPE-6 is 63.4 Tflops. We present the result of benchmark runs, and discuss the performance characteristics. We also present the measured performance for a few real scientific applications. The best performance so far achieved with real applications is 35.3 Tflops.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
[1] J. Aarseth, Sverre. Dynamical evolution of clusters of galaxies, i. Monthly Notices of Royal Astronomical Society, 126:223-255, 1963.
 
2
[2] S. J. Aarseth. Star Cluster Simulations: the State of the Art. Celestial Mechanics and Dynamical Astronomy, 73:127-137, 1999.
 
3
[3] J. Barnes and P. Hut. A hiearchical o(nlogn) force calculation algorithm. Nature, 324:446-449, 1986.
 
4
[4] P. Chatterjee, L. Hernquist, and A. Loeb. Effects of wandering on the coalescence of black hole binaries in galactic centers, astro-ph/0302573, 2003.
 
5
 
6
 
7
[7] D. C. Heggie and R. D. Mathieu. Standardised units and time scales. In P. Hut and S. McMillan, editors, The Use of Supercomputers in Stellar Dynamics, pages 233-236, New York, 1986. Springer.
 
8
[8] T. Ito, J. Makino, T. Ebisuzaki, and D. Sugimoto. A special-purpose n-body machine grape-1. Computer Physics Communications, 60:187-194, 1990.
 
9
[9] J. Makino. An efficient parallel algorithm for O(N2) direct summation method and its variations on distributed-memory parallel machines. New Astronomy, 7:373-384, Oct. 2002.
 
10
[10] J. Makino and S. J. Aarseth. On a hermite integrator with ahmad-cohen scheme for gravitational many-body problems. Publications of the Astronomical Society of Japan, 44:141-151, 1992.
 
11
[11] J. Makino and P. Hut. Performance analysis of direct n-body calculations. The Astrophysical Journal Supplement Series, 68:833-856, 1988.
 
12
13
 
14
 
15
[15] J. Makino, M. Taiji, T. Ebisuzaki, and D. Sugimoto. Grape-4: A massively parallel special-purpose computer for collisional n-body simulations. The Astrophysical Journal, 480:432-446, 1997.
 
16
[16] S. L. W. McMillan and S. J. Aarseth. An o(n log n) integration scheme for collisional stellar systems. The Astrophysical Journal, 414:200-212, 1993.
 
17
[17] M. Milosavljevi¿ and D. Merritt. Formation of Galactic Nuclei. The Astrophysical Journal, 563:34-62, Dec. 2001.
 
18
[18] V. Springel, N. Yoshida, and S. D. White. Gadget: A code for collisionless and gasdynamical cosmological simulations. New Astronomy, 6:79-117, 2001.
 
19
[19] R. Spurzem and H. Baumgardt. A parallel implementation of an aarseth n-body integrator on general and special purpose supercomputers. submitted to Monthly Notices of Royal Astronomical Society, 1999.
 
20
[20] M. S. Warren, J. K. Salmon, D. J. Becker, M. P. Goda, T. Sterling, and G. S. Winckelmans. Pentium pro inside: I. a treecode at 430 gigaflops on asci red, ii. price/performance of $50/mflop on loki and hyglac. In Proceedings of SC97, pages (CD-ROM). ACM, 1997.

Collaborative Colleagues:
Junichiro Makino: colleagues
Eiichiro Kokubo: colleagues
Toshiyuki Fukushige: colleagues