|
ABSTRACT
Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams. However, little work has been done to explore what techniques are required to incorporate these algorithms in a data stream query processor, and to make them useful in practice.In this paper, we study the performance implications of using user-defined aggregate functions (UDAFs) to incorporate selection-based and sketch-based algorithms for holistic aggregates into a data stream management system's query processing architecture. We identify key performance bottlenecks and tradeoffs, and propose novel techniques to make these holistic UDAFs fast and space-efficient for use in high-speed data stream applications. We evaluate performance using generated and actual IP packet data, focusing on approximating quantiles and heavy hitters. The best of our current implementations can process streaming queries at OC48 speeds (2x 2.4Gbps).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Agilent Technologies. Router Tester. http://advanced.comms.agilent.com/Router Tester/.
|
 |
2
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303978]
|
 |
3
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
| |
4
|
A. Arasu and et al. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26(1):19--26, 2003.
|
 |
5
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
6
|
D. Carney and et al. Monitoring streams - a new class of data management applications. In Proc VLDB, pages 215--226, 2002.
|
| |
7
|
S. Chandrasekaran and et al. Telegraph CQ: Continuous dataflow procesing for an uncertain world. In Proc. CIDR, 2003.
|
| |
8
|
|
| |
9
|
G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing data streams using Hamming norms. In Proc. Intl. Conf. VLDB, pages 335--345, 2002.
|
| |
10
|
|
 |
11
|
Chuck Cranor , Yuan Gao , Theodore Johnson , Vlaidslav Shkapenyuk , Oliver Spatscheck, Gigascope: high performance network monitoring with an SQL interface, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564777]
|
 |
12
|
|
| |
13
|
C. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk. The Gigascope stream database. IEEE Data Engineering Bulletin, 26(1): pages 27--32, 2003.
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
Johannes Gehrke , Flip Korn , Divesh Srivastava, On computing correlated aggregates over continual data streams, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.13-24, May 21-24, 2001, Santa Barbara, California, United States
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In Proc. Intl. Conf. VLDB, pages 454--465, 2002.
|
| |
23
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
 |
24
|
|
 |
25
|
|
| |
26
|
|
| |
27
|
ISO DBL LHR-004 and ANSI X3H2-95-364. (ISO/ANSI Working Draft) Database Language SQL3.
|
 |
28
|
|
| |
29
|
N. Koudas and D. Srivastava. Data stream query processing: A tutorial. In Proc. VLDB, page 1149, 2003.
|
| |
30
|
A. Lerner and D. Shasha. The virtues and challenges of ad hoc + streams querying in finance. Data Engineering Bulletin, 26(1):49--56, 2003.
|
| |
31
|
|
| |
32
|
G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. VLDB, pages 346--357, 2002.
|
 |
33
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.426-435, June 01-04, 1998, Seattle, Washington, United States
|
| |
34
|
|
| |
35
|
Stanford stream data manager. http://www-db.stanford.edu/stream/sqr 2003. J. Widom and et al.
|
| |
36
|
M. Sullivan and A. Heybey. Tribeca: A system for managing large databases of network traffic. In Proc. USENIX Technical Conf., 1998.
|
 |
37
|
|
| |
38
|
H. Wang and C. Zaniolo. ATLaS: A native extension of SQL for data mining. In SIAM Intl. Conf. Data Mining 2003.
|
CITED BY 15
|
|
Robert Schweller , Ashish Gupta , Elliot Parsons , Yan Chen, Reversible sketches for efficient and accurate change detection over network data streams, Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, October 25-27, 2004, Taormina, Sicily, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yijian Bai , Hetal Thakkar , Haixun Wang , Chang Luo , Carlo Zaniolo, A data stream language and system designed for power and extensibility, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
Robert Schweller , Zhichun Li , Yan Chen , Yan Gao , Ashish Gupta , Yin Zhang , Peter A. Dinda , Ming-Yang Kao , Gokhan Memik, Reversible sketches: enabling monitoring and analysis over high-speed data streams, IEEE/ACM Transactions on Networking (TON), v.15 n.5, p.1059-1072, October 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|