|
ABSTRACT
A defining characteristic of continuous queries over on-line data streams, possibly bounded by sliding windows, is the potentially infinite and time-evolving nature of their inputs and outputs. New items continually arrive on the input streams and new results are continually produced. Additionally, inputs expire by falling out of range of their sliding windows and results expire when they cease to satisfy the query. This impacts continuous query processing in two ways. First, data stream systems allow tables to be queried alongside data streams, but in terms of query semantics, it is not clear how updates of tables are different from insertions and deletions caused by the movement of the sliding windows. Second, many interesting queries need to store state, which must be kept up-to-date as time goes on. Therefore, query processing efficiency depends highly on the amount of overhead involved in state maintenance.In this paper, we show that the above issues can be solved by understanding the update patterns of continuous queries and exploiting them during query processing. We propose a classification that defines four types of update characteristics. Using our classification, we present a definition of continuous query semantics that clearly states the role of relations. We then propose the notion of update-pattern-aware query processing, where physical implementations of query operators, including the data structures used for storing intermediate state, vary depending on the update patterns of their inputs and outputs. When tested on IP traffic logs, our update-pattern-aware query plans routinely outperform the existing techniques by an order of magnitude.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
[doi> 10.1007/s00778-003-0095-z]
|
 |
2
|
|
| |
3
|
A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: Semantic foundations and query execution. The VLDB Journal, 14(1), 2005, to appear.
|
 |
4
|
|
| |
5
|
|
 |
6
|
Shivnath Babu , Rajeev Motwani , Kamesh Munagala , Itaru Nishizawa , Jennifer Widom, Adaptive ordering of pipelined stream filters, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007615]
|
 |
7
|
|
 |
8
|
|
| |
9
|
D. Carney, U. Cetintemel, A. Rasin, S. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. VLDB 2003, pages 838--849.
|
| |
10
|
L. Golab and M. T. Özsu. Processing sliding window multi-joins in continuous queries over data streams. VLDB 2003, pages 500--511.
|
| |
11
|
M. Hammad, W. Aref, M. Franklin, M. Mokbel, and A. Elmagarmid. Efficient execution of sliding window queries over data streams. Technical Report CSD TR 03-medulla035, Purdue University, 2003.
|
| |
12
|
M. A. Hammad , M. F. Mokbel , M. H. Ali , W. G. Aref , A. C. Catlin , A. K. Elmagarmid , M. Eltabakh , M. G. Elfeky , T. M. Ghanem , R. Gwadera , I. F. Ilyas , M. Marzouk , X. Xiong, Nile: A Query Processing Engine for Data Streams, Proceedings of the 20th International Conference on Data Engineering, p.851, March 30-April 02, 2004
|
| |
13
|
Q. Jiang and S. Chakravarthy. Scheduling strategies for processing continuous queries over streams. BNCOD 2004, pages 16--30.
|
| |
14
|
J. Kang, J. Naughton, and S. Viglas. Evaluating windows joins over unbounded streams. ICDE 2003, pages 341--352.
|
| |
15
|
J. Krämer and B. Seeger. A temporal foundation for continuous queries over data streams. COMAD 2005, pages 70--82.
|
| |
16
|
Y.-N. Law, H. Wang, and C. Zaniolo. Query languages and data models for database sequences and data streams. VLDB 2004, pages 492--503.
|
| |
17
|
S. Madden and M. J. Franklin. Fjording the stream: An architecture for queries over streaming sensor data. ICDE 2002, pages 555--566.
|
| |
18
|
R. Motwani et al. Query processing, approximation, and resource management in a data stream management system. CIDR 2003, pages 245--256.
|
| |
19
|
|
 |
20
|
|
 |
21
|
Douglas Terry , David Goldberg , David Nichols , Brian Oki, Continuous queries over append-only databases, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.321-330, June 02-05, 1992, San Diego, California, United States
|
| |
22
|
|
| |
23
|
S. Viglas, J. Naughton, and J. Burger. Maximizing the output rate of multi-join queries over streaming information sources. VLDB 2003, pages 285--296.
|
CITED BY 7
|
|
|
|
|
|
|
|
Yijian Bai , Hetal Thakkar , Haixun Wang , Chang Luo , Carlo Zaniolo, A data stream language and system designed for power and extensibility, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
Irina Botan , Gustavo Alonso , Peter M. Fischer , Donald Kossmann , Nesime Tatbul, Flexible and scalable storage management for data-intensive stream processing, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|