|
ABSTRACT
We present algorithms for fast quantile and frequency estimation in large data streams using graphics processors (GPUs). We exploit the high computation power and memory bandwidth of graphics processors and present a new sorting algorithm that performs rasterization operations on the GPUs. We use sorting as the main computational component for histogram approximation and construction of ε-approximate quantile and frequency summaries. Our algorithms for numerical statistics computation on data streams are deterministic, applicable to fixed or variable-sized sliding windows and use a limited memory footprint. We use GPU as a co-processor and minimize the data transmission between the CPU and GPU by taking into account the low bus bandwidth. We implemented our algorithms on a PC with a NVIDIA GeForce FX 6800 Ultra GPU and a 3.4 GHz Pentium IV CPU and applied them to large data streams consisting of more than 100 million values. We also compared the performance of our GPU-based algorithms with optimized implementations of prior CPU-based algorithms. Overall, our results demonstrate that the graphics processors available on a commodity computer system are efficient stream-processor and useful co-processors for mining data streams.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. Agarwal, S. Krishnan, N. Mustafa, and S. Venkatasubramanian. Streaming geometric optimization using graphics hardware. 11 European Symposium on Algorithms, 2003.
|
 |
2
|
|
| |
3
|
|
| |
4
|
Rohit Ananthakrishna , Abhinandan Das , Johannes Gehrke , Flip Korn , S. Muthukrishnan , Divesh Srivastava, Efficient Approximation of Correlated Sums on Data Streams, IEEE Transactions on Knowledge and Data Engineering, v.15 n.3, p.569-572, March 2003
[doi> 10.1109/TKDE.2003.1198391]
|
| |
5
|
A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, and J. Widom. Stanford data stream management system. Data Stream Management, 2004. To appear.
|
 |
6
|
|
| |
7
|
N. Bandi, C. Sun, D. Agrawal, and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. VLDB, 2004.
|
| |
8
|
K. E. Batcher. Sorting networks and their applications. In Proc. 1968 Spring Joint Computer Conf., pages 307--314, Reston, VA, 1968. AFIPS Press.
|
| |
9
|
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for gpus: Stream computation on graphics hardware. Proc. of ACM SIGGRAPH, 2004.
|
| |
10
|
|
 |
11
|
|
 |
12
|
|
| |
13
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
| |
14
|
|
| |
15
|
Michael Doggett. Programmability features of graphics hardware. ACM SIGGRAPH Course Notes # 11, 2003.
|
 |
16
|
|
 |
17
|
|
| |
18
|
C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu. Mining frequent patterns in data streams at multiple time granularities. Proc. of NSF Workshop on Next Generation Data Mining, 2002.
|
 |
19
|
|
 |
20
|
Naga K. Govindaraju , Brandon Lloyd , Wei Wang , Ming Lin , Dinesh Manocha, Fast computation of database operations using graphics processors, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007594]
|
| |
21
|
N. Govindaraju, N. Raghuvanshi, M. Henson, and D. Manocha. A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Technical report, University of North Carolina-Chapel Hill, 2005.
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
 |
25
|
|
 |
26
|
Cheqing Jin , Weining Qian , Chaofeng Sha , Jeffrey X. Yu , Aoying Zhou, Dynamically maintaining frequent items over a data stream, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
[doi> 10.1145/956863.956918]
|
| |
27
|
R. Jin and G. Agrawal. An algorithm for in-core frequent itemset mining on streaming data. Submitted for Publication, 2004.
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
A. Lastra, M. Lin, and D. Manocha. GPGP: General purpose computation using graphics processors. http://www.cs.unc.edu/Events/Conferences/GP2, 2004.
|
| |
33
|
|
| |
34
|
Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In Philip A. Bernstein et al., editórs, VLDB 2002: proceedings of the Twenty-Eighth International Conference on Very Large Data Bases, Hong Kong, pages 346--357, 2002.
|
 |
35
|
|
 |
36
|
|
| |
37
|
D. Manocha. Interactive Geometric and Scientific Computations using Graphics Hardware. SIGGRAPH Course Notes # 11, 2003.
|
| |
38
|
J. Misra and D. Gries. Finding repeated elements. Sc. Comp. Prog., 2:143--152, 1982.
|
| |
39
|
|
| |
40
|
J. Munro and M. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, pages 315--323, 1980.
|
| |
41
|
|
| |
42
|
Timothy J. Purcell , Craig Donner , Mike Cammarano , Henrik Wann Jensen , Pat Hanrahan, Photon mapping on programmable graphics hardware, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, July 26-27, 2003, San Diego, California
|
 |
43
|
|
 |
44
|
|
| |
45
|
S. Venkatasubramanian. The graphics card as a stream computer. Workshop on Management and Processing of Data Streams, 2003.
|
| |
46
|
J. Xu, X. Lin, and X. Zhou. Space efficient quantile summary for constrained sliding windows on a data stream. Proceedings of WAIM (LNCS 3129), pages 34--44, 2004.
|
 |
47
|
|
CITED BY 14
|
|
Naga Govindaraju , Jim Gray , Ritesh Kumar , Dinesh Manocha, GPUTeraSort: high performance graphics co-processor sorting for large database management, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
Rui Fang , Bingsheng He , Mian Lu , Ke Yang , Naga K. Govindaraju , Qiong Luo , Pedro V. Sander, GPUQP: query co-processing using graphics processors, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
|
|
|
Ke Yang , Bingsheng He , Rui Fang , Mian Lu , Naga Govindaraju , Qiong Luo , Pedro Sander , Jiaoying Shi, In-memory grid files on graphics processors, Proceedings of the 3rd international workshop on Data management on new hardware, June 15-15, 2007, Beijing, China
|
|
|
|
|
|
Bingsheng He , Ke Yang , Rui Fang , Mian Lu , Naga Govindaraju , Qiong Luo , Pedro Sander, Relational joins on graphics processors, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wenbin Fang , Mian Lu , Xiangye Xiao , Bingsheng He , Qiong Luo, Frequent itemset mining on graphics processors, Proceedings of the Fifth International Workshop on Data Management on New Hardware, June 28-28, 2009, Providence, Rhode Island
|
|