ACM Home Page
Please provide us with feedback. Feedback
A compiler and runtime system for enabling data mining applications on gpus
Full text PdfPdf (785 KB)
Source
Principles and Practice of Parallel Programming archive
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming table of contents
Raleigh, NC, USA
POSTER SESSION: Posters table of contents
Pages 287-288  
Year of Publication: 2009
ISBN:978-1-60558-397-6
Also published in ...
Authors
Wenjing Ma  The Ohio State University, Columbus, OH, USA
Gagan Agrawal  The Ohio State University, Columbus, OH, USA
Sponsors
ACM: Association for Computing Machinery
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 48,   Downloads (12 Months): 322,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1504176.1504218
What is a DOI?

ABSTRACT

With increasing need for accelerating data mining and scientific data analysis on large data sets, and less chance to improve processor performance by simply increasing clock frequencies, multi-core architectures and accelerators like FPGAs and GPUs have become popular. A recent development in using GPU for general computing has been the release of CUDA (Compute Unified Device Architecture) by NVIDIA. CUDA allows GPU programming with Clanguage-like features, thus easing the development of non-graphics applications on a GPU. However, several challenges still remain in programming the GPUs with CUDA, because CUDA involves explicit parallel programming and management of its complex memory hierarchy, as well as allocating device memory, moving data between CPU anddevice memory, and specification of thread grid configurations.

In this paper, we offer a solution for the programmers to generate CUDA code by specifying the sequential reduction loop(s) with some information about the parameters. With program analysis and code generation, the applications are mapped to a GPU. Several additional optimizations are also performed by the middleware.

We have evaluated our system using three popular data miningapplications, k-means clustering, EM clustering, and Principal Component Analysis (PCA). The speedup that each of these applications achieve over a sequential CPU version ranges between 20 and 50.



Collaborative Colleagues:
Wenjing Ma: colleagues
Gagan Agrawal: colleagues