ACM Home Page
Please provide us with feedback. Feedback
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Full text PdfPdf (913 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 23rd international conference on Supercomputing table of contents
Yorktown Heights, NY, USA
SESSION: Accelerating applications with GPUs I table of contents
Pages 256-265  
Year of Publication: 2009
ISBN:978-1-60558-498-0
Authors
Jiayuan Meng  University of Virginia, Charlottesville, VA, USA
Kevin Skadron  University of Virginia, Charlottesville, VA, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 105,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542275.1542313
What is a DOI?

ABSTRACT

Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture, there are usually halo regions that need to be updated and exchanged among different processing elements (PEs). In addition, synchronization is often used to signal the completion of halo exchanges. Both communication and synchronization may incur significant overhead on parallel architectures with shared memory. This is especially true in the case of graphics processors (GPUs), which do not preserve the state of the per-core L1 storage across global synchronizations. To reduce these overheads, ghost zones can be created to replicate stencil operations, reducing communication and synchronization costs at the expense of redundantly computing some values on multiple PEs. However, the selection of the optimal ghost zone size depends on the characteristics of both the architecture and the application, and it has only been studied for message-passing systems in a grid environment. To automate this process on shared memory systems, we establish a performance model using NVIDIA's Tesla architecture as a case study and propose a framework that uses the performance model to automatically select the ghost zone size that performs best and generate appropriate code. The modeling is validated by four diverse ISL applications, for which the predicted ghost zone configurations are able to achieve a speedup no less than 98% of the optimal speedup.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
M. Alpert. Not just fun and games. April 1999.
3
4
 
5
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general purpose applications on graphics processors using CUDA, June 2008.
 
6
NVIDIA Corporation. Geforce gtx 280 specifications. 2008.
 
7
NVIDIA Corporation. NVIDIA CUDA visual profiler. June 2008.
 
8
L. Dagum. OpenMP: A proposed industry standard API for shared memory programming, October 1997.
 
9
10
 
11
L. C. Evans. Partial Differential Equations. American Mathematical Society, 1998.
 
12
13
 
14
N. Goodnight. CUDA/OpenGL fluid simulation, April 2007.
15
 
16
17
 
18
W. Jalby and U. Meier. Optimizing matrix operations on a parallel multiprocessor with a hierarchical memory system. pages 429--432, 1986.
19
 
20
21
 
22
23
 
24
J. Lin, H. Zheng, Z. Zhu, Z. Zhang, and H. David. Dram-level prefetching for fully-buffered dimm: Design, performance and power saving. ISPASS'07, 2007.
 
25
26
 
27
 
28
J. Ramanujam. Tiling of iteration spaces for multicomputers. In Proc. Int. Conf. Parallel Processing, pages 179--186, 1990.
 
29
L. Renganarayana, M. Harthikote-Matha, R. Dewri, and S. Rajopadhye. Towards optimal multi-level tiling for stencil computations. IPDPS'07, pages 1--10, March 2007.
 
30
 
31
 
32
 
33
 
34
 
35
 
36

Collaborative Colleagues:
Jiayuan Meng: colleagues
Kevin Skadron: colleagues