ACM Home Page
Please provide us with feedback. Feedback
Optimizing away joins on data streams
Full text PdfPdf (365 KB)
Source SSPS; Vol. 301 archive
Proceedings of the 2nd international workshop on Scalable stream processing system table of contents
Nantes, France
SESSION: Query optimization table of contents
Pages 48-57  
Year of Publication: 2008
ISBN:978-159593-963-0
Authors
Lukasz Golab  AT&T Labs - Research
Theodore Johnson  AT&T Labs - Research
Nick Koudas  University of Toronto
Divesh Srivastava  AT&T Labs - Research
David Toman  University of Waterloo
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 61,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1379272.1379282
What is a DOI?

ABSTRACT

Monitoring aggregates on network traffic streams is a compelling application of data stream management systems. Often, streaming aggregation queries involve joining multiple inputs (e.g., client requests and server responses) using temporal join conditions (e.g., within 5 seconds), followed by computation of aggregates (e.g., COUNT) over temporal windows (e.g., every 5 minutes). These types of queries help identify malfunctioning servers (missing responses), malicious clients (bursts of requests during a denial-of-service attack), or improperly configured protocols (short timeout intervals causing many retransmissions). However, while such query expression is natural, its evaluation over massive data streams is inefficient.

In this paper, we develop rewriting techniques for streaming aggregation queries that join multiple inputs. Our techniques identify conditions under which expensive joins can be optimized away, while providing error bounds for the results of the rewritten queries. The basis of the optimization is a powerful but decidable theory in which constraints over data streams can be formulated. We show the efficiency and accuracy of our solutions via experimental evaluation on real-life IP network data using the Gigascope stream processing engine.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Artale, E. Franconi, and F. Mandreoli. Description Logics for Modelling Dynamic Information. In Logics for Emerging Applications of Databases. Lecture Notes in Computer Science, Springer-Verlag, 2003.
2
3
4
 
5
 
6
J. R. Büchi. On a Decision Method in Restricted Second Order Arithmetic. In International Congress on Logic, Methodology, and Philosophy of Science, pages 1--11, 1962.
7
8
 
9
J. Chomicki and D. Toman. Temporal Databases. In Handbook of Temporal Reasoning in Artificial Intelligence. Elsevier, 2005.
10
 
11
D. DeHaan, D. Toman, and G. E. Weddell. Rewriting Aggregate Queries using Description Logics. In Description Logics, pages 103--112. CEUR-WS vol. 81, 2003.
 
12
 
13
 
14
 
15
Information Sciences Institute. RFC 793, 1981.
 
16
 
17
T. Johnson, S. Muthukrishnan, O. Spatscheck, and D. Srivastava. Streams, security and scalability. In IFIP Data and Applications Security, LNCS 3654, pages 1--15, 2005.
 
18
 
19
J. Kang, J. F. Naughton, and S. Viglas. Evaluating Window Joins over Unbounded Streams. In lCDE, pages 341--352, 2003.
 
20
 
21
 
22
M. Mellia, I. Stoica, and H. Zhang. TCP Model for Short Lived Flows. IEEE Communcations Letters, 6(2): 85--87, 2002.
 
23
 
24
V. Shkapenyuk, T. Johnson, S. Muthukrishnan, and O. Spatscheck. Query-aware sampling for data streams. In Int. Workshop on Scalable Stream Processing Systems (SSPS), 2007.
 
25
 
26
 
27
 
28
 
29
30
31

Collaborative Colleagues:
Lukasz Golab: colleagues
Theodore Johnson: colleagues
Nick Koudas: colleagues
Divesh Srivastava: colleagues
David Toman: colleagues