ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Stack-based parallel recursion on graphics processors
Full text PdfPdf (456 KB)
Source
Principles and Practice of Parallel Programming archive
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming table of contents
Raleigh, NC, USA
POSTER SESSION: Posters table of contents
Pages: 299-300  
Year of Publication: 2009
ISBN:978-1-60558-397-6
Also published in ...
Authors
Ke Yang  Zhejiang University, Hangzhou, China
Bingsheng He  Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Qiong Luo  Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Pedro V. Sander  Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Jiaoying Shi  Zhejiang University, Hangzhou, China
Sponsors
ACM: Association for Computing Machinery
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 36,   Downloads (12 Months): 246,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1504176.1504224
What is a DOI?

ABSTRACT

Recent research has shown promising results on using graphics processing units (GPUs) to accelerate general-purpose computation. However, today's GPUs do not support recursive functions. As a result, for inherently recursive algorithms such as tree traversal, GPU programmers need to explicitly use stacks to emulate the recursion. Parallelizing such stack-based implementation on the GPU increases the programming difficulty; moreover, it is unclear how to improve the efficiency of such parallel implementations. As a first step to address both ease of programming and efficiency issues, we propose three parallel stack implementation alternatives that differ in the granularity of stack sharing. Taking tree traversals as an example, we study the performance tradeoffs between these alternatives and analyze their behaviors in various situations. Our results could be useful to both GPU programmers and GPU compiler writers.



Collaborative Colleagues:
Ke Yang: colleagues
Bingsheng He: colleagues
Qiong Luo: colleagues
Pedro V. Sander: colleagues
Jiaoying Shi: colleagues