| Stack-based parallel recursion on graphics processors |
| Full text |
Pdf
(456 KB)
|
Source
|
Principles and Practice of Parallel Programming
archive
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
table of contents
Raleigh, NC, USA
POSTER SESSION: Posters
table of contents
Pages: 299-300
Year of Publication: 2009
ISBN:978-1-60558-397-6
Also published in ...
|
|
Authors
|
|
Ke Yang
|
Zhejiang University, Hangzhou, China
|
|
Bingsheng He
|
Hong Kong University of Science and Technology, Hong Kong, Hong Kong
|
|
Qiong Luo
|
Hong Kong University of Science and Technology, Hong Kong, Hong Kong
|
|
Pedro V. Sander
|
Hong Kong University of Science and Technology, Hong Kong, Hong Kong
|
|
Jiaoying Shi
|
Zhejiang University, Hangzhou, China
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 36, Downloads (12 Months): 246, Citation Count: 0
|
|
|
ABSTRACT
Recent research has shown promising results on using graphics processing units (GPUs) to accelerate general-purpose computation. However, today's GPUs do not support recursive functions. As a result, for inherently recursive algorithms such as tree traversal, GPU programmers need to explicitly use stacks to emulate the recursion. Parallelizing such stack-based implementation on the GPU increases the programming difficulty; moreover, it is unclear how to improve the efficiency of such parallel implementations. As a first step to address both ease of programming and efficiency issues, we propose three parallel stack implementation alternatives that differ in the granularity of stack sharing. Taking tree traversals as an example, we study the performance tradeoffs between these alternatives and analyze their behaviors in various situations. Our results could be useful to both GPU programmers and GPU compiler writers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
CUDA (Compute Unified Device Architecture), http://developer.nvidia.com/object/cuda.html.
|
 |
2
|
|
| |
3
|
S. Popov, J. Günther, S. Hans-Peter et al, Stackless KD-Tree Traversal for High Performance GPU Ray Tracing In: Computer Graphics Forum 26(3), pp. 415--424, 2007.
|
| |
4
|
|
 |
5
|
Bingsheng He , Ke Yang , Rui Fang , Mian Lu , Naga Govindaraju , Qiong Luo , Pedro Sander, Relational joins on graphics processors, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
[doi> 10.1145/1376616.1376670]
|
 |
6
|
|
|