ACM Home Page
Please provide us with feedback. Feedback
Holistic UDAFs at streaming speeds
Full text PdfPdf (265 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: stream management table of contents
Pages: 35 - 46  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Graham Cormode  Rutgers University
Theodore Johnson  AT&T Labs-Research
Flip Korn  AT&T Labs-Research
S. Muthukrishnan  Rutgers University
Oliver Spatscheck  AT&T Labs-Research
Divesh Srivastava  AT&T Labs-Research
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 41,   Citation Count: 15
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007575
What is a DOI?

ABSTRACT

Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams. However, little work has been done to explore what techniques are required to incorporate these algorithms in a data stream query processor, and to make them useful in practice.In this paper, we study the performance implications of using user-defined aggregate functions (UDAFs) to incorporate selection-based and sketch-based algorithms for holistic aggregates into a data stream management system's query processing architecture. We identify key performance bottlenecks and tradeoffs, and propose novel techniques to make these holistic UDAFs fast and space-efficient for use in high-speed data stream applications. We evaluate performance using generated and actual IP packet data, focusing on approximating quantiles and heavy hitters. The best of our current implementations can process streaming queries at OC48 speeds (2x 2.4Gbps).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Agilent Technologies. Router Tester. http://advanced.comms.agilent.com/Router Tester/.
2
3
 
4
A. Arasu and et al. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26(1):19--26, 2003.
5
 
6
D. Carney and et al. Monitoring streams - a new class of data management applications. In Proc VLDB, pages 215--226, 2002.
 
7
S. Chandrasekaran and et al. Telegraph CQ: Continuous dataflow procesing for an uncertain world. In Proc. CIDR, 2003.
 
8
 
9
G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing data streams using Hamming norms. In Proc. Intl. Conf. VLDB, pages 335--345, 2002.
 
10
11
12
 
13
C. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk. The Gigascope stream database. IEEE Data Engineering Bulletin, 26(1): pages 27--32, 2003.
14
15
 
16
17
18
 
19
 
20
 
21
 
22
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In Proc. Intl. Conf. VLDB, pages 454--465, 2002.
 
23
24
25
 
26
 
27
ISO DBL LHR-004 and ANSI X3H2-95-364. (ISO/ANSI Working Draft) Database Language SQL3.
28
 
29
N. Koudas and D. Srivastava. Data stream query processing: A tutorial. In Proc. VLDB, page 1149, 2003.
 
30
A. Lerner and D. Shasha. The virtues and challenges of ad hoc + streams querying in finance. Data Engineering Bulletin, 26(1):49--56, 2003.
 
31
 
32
G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. VLDB, pages 346--357, 2002.
33
 
34
 
35
Stanford stream data manager. http://www-db.stanford.edu/stream/sqr 2003. J. Widom and et al.
 
36
M. Sullivan and A. Heybey. Tribeca: A system for managing large databases of network traffic. In Proc. USENIX Technical Conf., 1998.
37
 
38
H. Wang and C. Zaniolo. ATLaS: A native extension of SQL for data mining. In SIAM Intl. Conf. Data Mining 2003.

CITED BY  15
Collaborative Colleagues:
Graham Cormode: colleagues
Theodore Johnson: colleagues
Flip Korn: colleagues
S. Muthukrishnan: colleagues
Oliver Spatscheck: colleagues
Divesh Srivastava: colleagues