| OpenMP to GPGPU: a compiler framework for automatic translation and optimization |
| Full text |
Pdf
(493 KB)
|
Source
|
Principles and Practice of Parallel Programming
archive
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
table of contents
Raleigh, NC, USA
SESSION: Accelerator software
table of contents
Pages 101-110
Year of Publication: 2009
ISBN:978-1-60558-397-6
Also published in ...
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 150, Downloads (12 Months): 749, Citation Count: 1
|
|
|
ABSTRACT
GPGPUs have recently emerged as powerful vehicles for general-purpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from NVIDIA offers improved programmability for general computing, programming GPGPUs is still complex and error-prone. This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs. In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Muthu Manikandan Baskaran , Uday Bondhugula , Sriram Krishnamoorthy , J. Ramanujam , Atanas Rountev , P. Sadayappan, A compiler framework for optimization of affine loop nests for gpgpus, Proceedings of the 22nd annual international conference on Supercomputing, June 07-12, 2008, Island of Kos, Greece
[doi> 10.1145/1375527.1375562]
|
 |
3
|
|
| |
4
|
NVIDIA CUDA {online}. available: http://developer.nvidia.com/object/cuda home.html.
|
| |
5
|
NVIDIA CUDA SDK - Data-Parallel Algorithms: Parallel Reduction {online}. available: http://developer.download.nvidia.com/compute/cuda/1 1/Website/Data-Parallel Algorithms.html.
|
| |
6
|
Tim Davis. University of Florida Sparse Matrix Collection {online}. available: http://www.cise.ufl.edu/research/sparse/matrices/.
|
 |
7
|
|
| |
8
|
Sang Ik Lee, Troy Johnson, and Rudolf Eigenmann. Cetus - an extensible compiler infrastructure for source-to-source transformation. International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2003.
|
| |
9
|
David Levine, David Callahan, and Jack Dongarra. A comparative study of automatic vectorizing compilers. Parallel Computing, 17, 1991.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
OpenMP {online}. available: http://openmp.org/wp/.
|
 |
14
|
Shane Ryoo , Christopher I. Rodrigues , Sara S. Baghsorkhi , Sam S. Stone , David B. Kirk , Wen-mei W. Hwu, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
[doi> 10.1145/1345206.1345220]
|
 |
15
|
Shane Ryoo , Christopher I. Rodrigues , Sam S. Stone , Sara S. Baghsorkhi , Sain-Zee Ueng , John A. Stratton , Wen-mei W. Hwu, Program optimization space pruning for a multithreaded gpu, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
[doi> 10.1145/1356058.1356084]
|
| |
16
|
John A. Stratton , Sam S. Stone , Wen-Mei W. Hwu, MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs, Languages and Compilers for Parallel Computing: 21th International Workshop, LCPC 2008, Edmonton, Canada, July 31 - August 2, 2008, Revised Selected Papers, Springer-Verlag, Berlin, Heidelberg, 2008
[doi> 10.1007/978-3-540-89740-8_2]
|
| |
17
|
Narayanan Sundaram, Anand Raghunathan, and Srimat T. Chakradhar. A framework for efficient and scalable execution of domain-specific templates on GPUs. IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2009.
|
| |
18
|
Sain-Zee Ueng , Melvin Lathara , Sara S. Baghsorkhi , Wen-Mei W. Hwu, CUDA-Lite: Reducing GPU Programming Complexity, Languages and Compilers for Parallel Computing: 21th International Workshop, LCPC 2008, Edmonton, Canada, July 31 - August 2, 2008, Revised Selected Papers, Springer-Verlag, Berlin, Heidelberg, 2008
[doi> 10.1007/978-3-540-89740-8_1]
|
| |
19
|
|
 |
20
|
|
|