ACM Home Page
Please provide us with feedback. Feedback
High-performance CUDA kernel execution on FPGAs
Full text PdfPdf (393 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 23rd international conference on Supercomputing table of contents
Yorktown Heights, NY, USA
POSTER SESSION: Posters table of contents
Pages 515-516  
Year of Publication: 2009
ISBN:978-1-60558-498-0
Authors
Alexandros Papakonstantinou  University of Illinois, Urbana - Champaign, IL, USA
Karthik Gururaj  University of California, Los Angeles, CA, USA
John A. Stratton  University of Illinois, Urbana - Champaign, IL, USA
Deming Chen  University of Illinois, Urbana - Champaign, IL, USA
Jason Cong  University of California, Los Angeles, CA, USA
Wen-Mei W. Hwu  University of Illinois, Urbana - Champaign, IL, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 58,   Downloads (12 Months): 194,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542275.1542357
What is a DOI?

ABSTRACT

In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators -- FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
S. Lee, T. Johnson, and R. Eigenmann. Cetus - An extensible compiler infrastructure for source-to-source transformation. 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC'2003). 2003.
 
3
LLVM compiler, http://www.llvm.org
 
4
AutoESL, http://www.autoesl.com/.
 
5

Collaborative Colleagues:
Alexandros Papakonstantinou: colleagues
Karthik Gururaj: colleagues
John A. Stratton: colleagues
Deming Chen: colleagues
Jason Cong: colleagues
Wen-Mei W. Hwu: colleagues