|
ABSTRACT
We present two efficient Apriori implementations of Frequent Itemset Mining (FIM) that utilize new-generation graphics processing units (GPUs). Our implementations take advantage of the GPU's massively multi-threaded SIMD (Single Instruction, Multiple Data) architecture. Both implementations employ a bitmap data structure to exploit the GPU's SIMD parallelism and to accelerate the frequency counting operation. One implementation runs entirely on the GPU and eliminates intermediate data transfer between the GPU memory and the CPU memory. The other implementation employs both the GPU and the CPU for processing. It represents itemsets in a trie, and uses the CPU for trie traversing and incremental maintenance. Our preliminary results show that both implementations achieve a speedup of up to two orders of magnitude over optimized CPU Apriori implementations on a PC with an NVIDIA GTX 280 GPU and a quad-core CPU.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
|
| |
4
|
Lamine M. Aouad, Nhien-An Le-Khac, and Tahar M. Kechadi. Distributed frequent itemsets mining in heterogeneous platforms. Journal of Engineering, Computing and Architecture, 2007.
|
 |
5
|
Christian Bienia , Sanjeev Kumar , Jaswinder Pal Singh , Kai Li, The PARSEC benchmark suite: characterization and architectural implications, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
[doi> 10.1145/1454115.1454128]
|
| |
6
|
Ferenc Bodon. A fast apriori implementation. FIMI, 2003.
|
 |
7
|
Gregory Buehrer , Srinivasan Parthasarathy , Shirish Tatikonda , Tahsin Kurc , Joel Saltz, Toward terabyte pattern mining: an architecture-conscious solution, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
[doi> 10.1145/1229428.1229432]
|
| |
8
|
Shuai Che , Michael Boyer , Jiayuan Meng , David Tarjan , Jeremy W. Sheaffer , Kevin Skadron, A performance study of general-purpose applications on graphics processors using CUDA, Journal of Parallel and Distributed Computing, v.68 n.10, p.1370-1380, October, 2008
[doi> 10.1016/j.jpdc.2008.05.014]
|
 |
9
|
Shengnan Cong , Jiawei Han , Jay Hoeflinger , David Padua, A sampling-based framework for parallel data mining, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
[doi> 10.1145/1065944.1065979]
|
| |
10
|
|
| |
11
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on a modern processor, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
12
|
Bart Goethals and Mohammed Javeed Zaki. Advances in frequent itemset mining implementations: Introduction to fimi'03. FIMI, 2003.
|
 |
13
|
Naga Govindaraju , Jim Gray , Ritesh Kumar , Dinesh Manocha, GPUTeraSort: high performance graphics co-processor sorting for large database management, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142511]
|
 |
14
|
Naga K. Govindaraju , Brandon Lloyd , Wei Wang , Ming Lin , Dinesh Manocha, Fast computation of database operations using graphics processors, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007594]
|
 |
15
|
|
| |
16
|
|
 |
17
|
Bingsheng He , Wenbin Fang , Qiong Luo , Naga K. Govindaraju , Tuyong Wang, Mars: a MapReduce framework on graphics processors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
[doi> 10.1145/1454115.1454152]
|
 |
18
|
|
 |
19
|
Bingsheng He , Ke Yang , Rui Fang , Mian Lu , Naga Govindaraju , Qiong Luo , Pedro Sander, Relational joins on graphics processors, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
[doi> 10.1145/1376616.1376670]
|
| |
20
|
|
| |
21
|
|
| |
22
|
Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 1999.
|
 |
23
|
|
 |
24
|
Haoyuan Li , Yi Wang , Dong Zhang , Ming Zhang , Edward Y. Chang, Pfp: parallel fp-growth for query recommendation, Proceedings of the 2008 ACM conference on Recommender systems, October 23-25, 2008, Lausanne, Switzerland
[doi> 10.1145/1454008.1454027]
|
| |
25
|
|
| |
26
|
John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krĺźger, Aaron E. Lefohn, and Timothy J. Purcell. A survey of general-purpose computation on graphics hardware. In Computer Graphics Forum, 2007.
|
| |
27
|
Lance Parsons, Ehtesham Haque, and Huan Liu. Evaluating subspace clustering algorithms. SDM, 2004.
|
| |
28
|
S. Parthasarathy, M. J. Zaki, M. Ogihara, and W. Li. Parallel data mining for association rules on shared memory systems. In Knowledge and Information Systems, 2001.
|
| |
29
|
Jayaprakash Pisharath, Ying Liu, Wei keng Liao, Alok Choudhary, Gokhan Memik, and Janaki Parhi. Nu-minebench 2.0. Technical report, Northwestern University, 2005.
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
Mohammed J Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. New algorithms for fast discovery of association rules. KDD, 1997.
|
 |
34
|
|
|