APPENDICES and SUPPLEMENTS
|
|
Online appendix to semantics and implementation of continuous sliding window queries over data streams. The appendix supports the information on article 4.
|
ABSTRACT
In recent years the processing of continuous queries over potentially infinite data streams has attracted a lot of research attention. We observed that the majority of work addresses individual stream operations and system-related issues rather than the development of a general-purpose basis for stream processing systems. Furthermore, example continuous queries are often formulated in some declarative query language without specifying the underlying semantics precisely enough. To overcome these deficiencies, this article presents a consistent and powerful operator algebra for data streams which ensures that continuous queries have well-defined, deterministic results. In analogy to traditional database systems, we distinguish between a logical and a physical operator algebra. While the logical algebra specifies the semantics of the individual operators in a descriptive but concrete way over temporal multisets, the physical algebra provides efficient implementations in the form of stream-to-stream operators. By adapting and enhancing research from temporal databases to meet the challenging requirements in streaming applications, we are able to carry over the conventional transformation rules from relational databases to stream processing. For this reason, our approach not only makes it possible to express continuous queries with a sound semantics, but also provides a solid foundation for query optimization, one of the major research topics in the stream community. Since this article seamlessly explains the steps from query formulation to query execution, it outlines the innovative features and operational functionality implemented in our state-of-the-art stream processing infrastructure.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
[doi> 10.1007/s00778-003-0095-z]
|
 |
2
|
Arvind Arasu , Brian Babcock , Shivnath Babu , Jon McAlister , Jennifer Widom, Characterizing memory requirements for queries over continuous data streams, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543642]
|
| |
3
|
|
 |
4
|
|
 |
5
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
6
|
|
 |
7
|
Yijian Bai , Hetal Thakkar , Haixun Wang , Chang Luo , Carlo Zaniolo, A data stream language and system designed for power and extensibility, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
[doi> 10.1145/1183614.1183664]
|
| |
8
|
Barga, R. S., Goldstein, J., Ali, M. H., and Hong, M. 2007. Consistent streaming through time: A vision for event stream processing. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 363--374.
|
| |
9
|
|
| |
10
|
Don Carney , Uğur Çetintemel , Alex Rasin , Stan Zdonik , Mitch Cherniack , Mike Stonebraker, Operator scheduling in a data stream manager, Proceedings of the 29th international conference on Very large data bases, p.838-849, September 09-12, 2003, Berlin, Germany
|
| |
11
|
Chandrasekaran, S., Cooper, O., Deshpande, A., and et al. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the Conference on Innovative Data Systems Research (CIDR).
|
| |
12
|
|
 |
13
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, On random sampling over joins, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.263-274, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
14
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
| |
15
|
|
 |
16
|
|
| |
17
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
| |
18
|
|
 |
19
|
|
| |
20
|
Demers, A. J., Gehrke, J., Panda, B., Riedewald, M., Sharma, V., and White, W. M. 2007. Cayuga: A General purpose event monitoring system. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 412--422.
|
| |
21
|
Jens-Peter Dittrich , Bernhard Seeger , David Scot Taylor , Peter Widmayer, Progressive merge join: a generic and non-blocking sort-based join algorithm, Proceedings of the 28th international conference on Very Large Data Bases, p.299-310, August 20-23, 2002, Hong Kong, China
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
 |
28
|
|
| |
29
|
Jim Gray , Surajit Chaudhuri , Adam Bosworth , Andrew Layman , Don Reichart , Murali Venkatrao , Frank Pellow , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery, v.1 n.1, p.29-53, 1997
[doi> 10.1023/A:1009726021843]
|
| |
30
|
|
 |
31
|
|
 |
32
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
 |
33
|
Curtis Dyreson , Fabio Grandi , Wolfgang Käfer , Nick Kline , Nikos Lorentzos , Yannis Mitsopoulos , Angelo Montanari , Daniel Nonen , Elisa Peressi , Barbara Pernici , John F. Roddick , Nandlal L. Sarda , Maria Rita Scalas , Arie Segev , Richard Thomas Snodgrass , Mike D. Soo , Abdullah Tansel , Paolo Tiberio , Gio Wiederhold, A consensus glossary of temporal database concepts, ACM SIGMOD Record, v.23 n.1, p.52-64, March 1994
[doi> 10.1145/181550.181560]
|
| |
34
|
Kang, J., Naughton, J., and Viglas, S. 2003. Evaluating window joins over unbounded streams. In Proceedings of the International Conference on Data Engineering (ICDE), 341--352.
|
 |
35
|
|
| |
36
|
Krämer, J. 2007. Continuous queries over data streams -- Semantics and implementation. Ph.D. thesis, University of Marburg.
|
 |
37
|
|
| |
38
|
Krämer, J. and Seeger, B. 2005. A temporal foundation for continuous queries over data streams. In Proceedings of the International Conference on Management of Data (COMAD), 70--82.
|
| |
39
|
|
 |
40
|
Jin Li , David Maier , Kristin Tufte , Vassilis Papadimos , Peter A. Tucker, Semantics and evaluation techniques for window aggregates in data streams, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
[doi> 10.1145/1066157.1066193]
|
 |
41
|
|
| |
42
|
|
 |
43
|
|
| |
44
|
Patroumpas, K. and Sellis, T. K. 2006. Window specification over data streams. In Proceedings of the EDBT Workshops, 445--464.
|
| |
45
|
Raman, V., Deshpande, A., and Hellerstein, J. M. 2003. Using state modules for adaptive query processing. In Proceedings of the International Conference on Data Engineering (ICDE), 353.
|
 |
46
|
Prasan Roy , S. Seshadri , S. Sudarshan , Siddhesh Bhobe, Efficient and extensible algorithms for multi query optimization, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.249-260, May 15-18, 2000, Dallas, Texas, United States
|
 |
47
|
|
| |
48
|
|
| |
49
|
SQR. 2003. SQR -- A stream query repository. http://www.db.stanford.edu/stream/sqr.
|
 |
50
|
|
| |
51
|
Abdullah Uz Tansel , James Clifford , Shashi Gadia , Sushil Jajodia , Arie Segev , Richard Snodgrass, Temporal databases: theory, design, and implementation, Benjamin-Cummings Publishing Co., Inc., Redwood City, CA, 1993
|
| |
52
|
Nesime Tatbul , Uğur Çetintemel , Stan Zdonik , Mitch Cherniack , Michael Stonebraker, Load shedding in a data stream manager, Proceedings of the 29th international conference on Very large data bases, p.309-320, September 09-12, 2003, Berlin, Germany
|
| |
53
|
|
| |
54
|
Tucker, P. A., Tufte, K., Papadimos, V., and Maier, D. 2002. NEXMark -- A benchmark for queries over data streams. http://www.cse.ogi.edu/dot/niagara/NEXMark.
|
 |
55
|
|
| |
56
|
|
| |
57
|
|
| |
58
|
|
| |
59
|
|
 |
60
|
|
|