|
ABSTRACT
Graphics processing units (GPUs) have become an attractive option for accelerating scientific computations as a result of advances in the performance and flexibility of GPU hardware, and due to the availability of GPU software development tools targeting general purpose and scientific computation. However, effective use of GPUs in clusters presents a number of application development and system integration challenges. We describe strategies for the decomposition and scheduling of computation among CPU cores and GPUs, and techniques for overlapping communication and CPU computation with GPU kernel execution. We report the adaptation of these techniques to NAMD, a widely-used parallel molecular dynamics simulation package, and present performance results for a 64-core 64-GPU cluster.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. E. Stone, J. C. Phillips, P. L. Freddolino, D. J. Hardy, L. G. Trabuco, and K. Schulten, "Accelerating molecular modeling applications with graphics processors," J. Comp. Chem., vol. 28, pp. 2618--2640, 2007.
|
| |
2
|
J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, "GPU computing," Proceedings of the IEEE, vol. 96, pp. 879--899, 2008.
|
| |
3
|
|
| |
4
|
I. Ufimtsev and T. Martinez, "Quantum chemistry on graphical processing units. 1. strategies for two-electron integral evaluation," Journal of Chemical Theory and Computation, vol. 4, no. 2, pp. 222--231, 2008.
|
 |
5
|
Christopher I. Rodrigues , David J. Hardy , John E. Stone , Klaus Schulten , Wen-Mei W. Hwu, GPU acceleration of cutoff pair potentials for molecular modeling applications, Proceedings of the 2008 conference on Computing frontiers, May 05-07, 2008, Ischia, Italy
[doi> 10.1145/1366230.1366277]
|
 |
6
|
Ian Buck , Tim Foley , Daniel Horn , Jeremy Sugerman , Kayvon Fatahalian , Mike Houston , Pat Hanrahan, Brook for GPUs: stream computing on graphics hardware, ACM SIGGRAPH 2004 Papers, August 08-12, 2004, Los Angeles, California
|
 |
7
|
|
| |
8
|
M. Charalambous, P. Trancoso, and A. Stamatakis, "Initial experiences porting a bioinformatics application to a graphics processor." in Panhellenic Conference on Informatics, 2005, pp. 415--425.
|
| |
9
|
|
| |
10
|
E. Elsen, V. Vishal, M. Houston, V. Pande, P. Hanrahan, and E. Darve, "N-body simulations on GPUs," Stanford University, Stanford, CA, Tech. Rep., Jun. 2007, http://arxiv.org/abs/0706.3060.
|
| |
11
|
"NVIDIA CUDA Compute Unified Device Architecture Programming Guide," NVIDIA, NVIDIA, Santa Clara, CA, USA, 2007.
|
| |
12
|
M. McCool, "Data-parallel programming on the Cell BE and the GPU using the RapidMind development platform," in GSPx Multicore Applications Conference, Oct./Nov. 2006.
|
| |
13
|
Advanced Micro Devices Inc., "Brook+ SC07 BOF session," in Supercomputing 2007 Conference, Nov. 2007.
|
| |
14
|
J. Stratton, S. Stone, and W. mei Hwu, "MCUDA: An efficient implementation of CUDA kemels on multi-cores," University of Illinois at Urbana-Champaign, Tech. Rep. IMPACT-08-01, March 2008. {Online}. Available: http://www.gigascale.org/pubs/1278.html
|
 |
15
|
|
| |
16
|
|
| |
17
|
Dominik Göddeke , Robert Strzodka , Jamaludin Mohd-Yusof , Patrick McCormick , Sven H. M. Buijssen , Matthias Grajewski , Stefan Turek, Exploring weak scalability for FEM calculations on a GPU-enhanced cluster, Parallel Computing, v.33 n.10-11, p.685-699, November, 2007
[doi> 10.1016/j.parco.2007.09.002]
|
 |
18
|
J. N. Glosli , D. F. Richards , K. J. Caspersen , R. E. Rudd , J. A. Gunnels , F. H. Streitz, Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
[doi> 10.1145/1362622.1362700]
|
| |
19
|
|
| |
20
|
|
| |
21
|
Laxmikant Kalé , Robert Skeel , Milind Bhandarkar , Robert Brunner , Attila Gursoy , Neal Krawetz , James Phillips , Artiomo Shinozaki , Krishnan Varadarajan , Klaus Schulten, NAMD2: greater scalability for parallel molecular dynamics, Journal of Computational Physics, v.151 n.1, p.283-312, May 1, 1999
[doi> 10.1006/jcph.1999.6201]
|
| |
22
|
J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, and K. Schulten, "Scalable molecular dynamics with NAMD," J. Comp. Chem., vol. 26, pp. 1781--1802, 2005.
|
| |
23
|
M. Nelson, W. Humphrey, A. Gursoy, A. Dalke, L. Kalé, R. Skeel, K. Schulten, and R. Kufrin, "MDScope - A visual computing environment for structural biology," Comput. Phys. Commun., vol. 91, no. 1--3, pp. 111--134, 1995.
|
| |
24
|
M. Nelson, W. Humphrey, A. Gursoy, A. Dalke, L. Kalé, R. D. Skeel, and K. Schulten, "NAMD - A parallel, object-oriented molecular dynamics program," Int. J. Supercomp. Appl. High Perform. Comp., vol. 10, pp. 251--268, 1996.
|
| |
25
|
James C. Phillips , Gengbin Zheng , Sameer Kumar , Laxmikant V. Kalé, NAMD: biomolecular simulation on thousands of processors, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-18, November 16, 2002, Baltimore, Maryland
|
| |
26
|
P. L. Freddolino, F. Liu, M. Gruebele, and K. Schulten, "Tenmicrosecond MD simulation of a fast-folding WW domain," Biophys. J., vol. 94, pp. L75-L77, 2008.
|
| |
27
|
P. L. Freddolino, A. S. Arkhipov, S. B. Larson, A. McPherson, and K. Schulten, "Molecular dynamics simulations of the complete satellite tobacco mosaic virus," Structure, vol. 14, pp. 437--449, 2006.
|
| |
28
|
L. V. Kale and S. Krishnan, "Charm++: Parallel Programming with Message-Driven Objects," in Parallel Programming using C++, G. V. Wilson and P. Lu, Eds. MIT Press, 1996, pp. 175--213.
|
| |
29
|
L. V. Kale, E. Bohm, C. L. Mendes, T. Wilmarth, and G. Zheng, "Programming Petascale Applications with Charm++ and AMPI," in Petascale Computing: Algorithms and Applications, D. Bader, Ed. Chapman & Hall / CRC Press, 2008, pp. 421--441.
|
| |
30
|
|
| |
31
|
|
 |
32
|
|
| |
33
|
L. V. Kalé, "The virtualization model of parallel programming: Runtime optimizations and the state of art," in LACSI 2002, Albuquerque, October 2002.
|
| |
34
|
|
| |
35
|
|
|