ACM Home Page
Please provide us with feedback. Feedback
A translation system for enabling data mining applications on GPUs
Full text PdfPdf (401 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 23rd international conference on Supercomputing table of contents
Yorktown Heights, NY, USA
SESSION: Accelerating applications with GPUs II table of contents
Pages 400-409  
Year of Publication: 2009
ISBN:978-1-60558-498-0
Authors
Wenjing Ma  The Ohio State University, Columbus, OH, USA
Gagan Agrawal  The Ohio State University, Columbus, OH, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 64,   Downloads (12 Months): 239,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542275.1542331
What is a DOI?

ABSTRACT

Modern GPUs offer much computing power at a very modest cost. Even though CUDA and other related recent developments are accelerating the use of GPUs for general purpose applications, several challenges still remain in programming the GPUs. Thus, it is clearly desirable to be able to program GPUs using a higher-level interface.

In this paper, we offer a solution that targets a specific class of applications, which are the data mining and scientific data analysis applications. Our work is driven by the observation that a common processing structure, that of generalized reductions, fits a large number of popular data mining algorithms. In our solution, the programmers simply need to specify the sequential reduction loop(s) with some additional information about the parameters. We use program analysis and code generation to map the applications to a GPU. Several additional optimizations are also performed by the system.

We have evaluated our system using three popular data mining applications, k-means clustering, EM clustering, and Principal Component Analysis (PCA). The main observations from our experiments are as follows. The speedup that each of these applications achieve over a sequential CPU version ranges between 20 and 50. The automatically generated version did not have any noticeable overheads compared to hand written codes. Finally, the optimizations performed in the system resulted in significant performance improvements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
P. Anderson, D. Binkley, G. Rosay, and T. Teitelbaum. Flow Insensitive Points-To Sets. scam, 00:0081, 2001.
 
4
Sara Baghsorkhi, Melvin Lathara, and Wen mei Hwu. CUDA-lite: Reducing GPU Programming Complexity. In LCPC 2008, 2008.
 
5
6
 
7
 
8
Randal E. Bryant. Data-Intensive Supercomputing: The Case for DISC. Technical Report CMU-CS-07-128, School of Computer Science, Carnegie Mellon University, 2007.
 
9
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Mike, and H. Pat. Brook for GPUs: Stream Computing on Graphics Hardware, 2004.
 
10
Benjamin Bustos, Oliver Deussen, Stefan Hiller, and Daniel Keim. A Graphics Hardware Accelerated Algorithm for Nearest Neighbor Search. In Vassil N. Alexandrov, Geert Dick van Albada, Peter M. A. Sloot, and Jack Dongarra, editors, Computational Science-ICCS 2006, volume 3994 of LNCS, pages 196--199. Springer, 2006.
 
11
Maria Charalambous, Pedro Trancoso, and Alexandros Stamatakis. Initial experiences porting a bioinformatics application to a graphics processor. In Panhellenic Conference on Informatics, pages 415--425, 2005.
 
12
Shuai Che, Jiayuan Meng, and Jeremy W. Sheaffer. A Performance Study of General Purpose Applications on Graphics Processors.
 
13
 
14
Matthias Christen, Olaf Schenk, and Helmar Burkhart. General-Purpose Sparse Matrix Building Blocks using the NVIDIA CUDA Technology Platform. In First Workshop on General Purpose Processing on Graphics Processing Units, Oct 2007.
 
15
 
16
Arthur Dempster, Nan Laird, and Donald Rubin. Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.
 
17
 
18
Vincent Garcia, Eric Debreuve, and Michel Barlaud. Fast k Nearest Neighbor Search using GPU, 2008.
19
20
 
21
Jesse D. Hall and John C. Hart. GPU Acceleration of Iterative Clustering. Jun 2004.
 
22
 
23
 
24
25
26
 
27
 
28
R. Jin and G. Agrawal. Shared memory parallelization of data mining algorithms: Techniques. citeseer.ist.psu.edu/article/jin02shared.html, 2002.
 
29
Ruoming Jin and Gagan Agrawal. A Middleware for Developing Parallel Data Mining Implementations. In Proceedings of the first SIAM conference on Data Mining, April 2001.
 
30
Andreas Klockner. PyCuda, 2008.
 
31
 
32
33
 
34
Shih-Wei Liao. Parallelizing user-defined and implicit reductions globally on multiprocessors. In Chris R. Jesshope and Colin Egan, editors, Asia-Pacific Computer Systems Architecture Conference, volume 4186 of Lecture Notes in Computer Science, pages 189--202. Springer, 2006.
 
35
 
36
 
37
 
38
NVidia. NVIDIA CUDA Compute Unified Device Architecture Programming Guide. version 2.0. http://developer.download.nvidia.com/compute/cuda/2.0-Beta2/docs/Programming_Guide_2.0beta2.pdf, June 7 2008.
39
40
41
 
42
 
43
Erik Sintorn and Ulf Assarsson. Fast Parallel GPU-Sorting Using a Hybrid Algorithm. In First Workshop on General Purpose Processing on Graphics Processing Units, Oct 2007.
 
44
John Stratton, Sam Stone, and Wen mei Hwu. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-Core CPUs. In 21st Annual Workshop on Languages and Compilers for Parallel Computing (LCPC'2008), July 2008.
45
 
46
 
47
Neil Trevett. OpenCL: The Open Standdard for Heterogeneous Parallel Programming, 2008.
48
 
49
Hans P. Zima and Barbara Mary Chapman. Compiling for Distributed-Memory Systems. Proceedings of the IEEE, 81(2):264--287, February 1993. In Special Section on Languages and Compilers for Parallel Machines.

Collaborative Colleagues:
Wenjing Ma: colleagues
Gagan Agrawal: colleagues