| Semantics and evaluation techniques for window aggregates in data streams |
| Full text |
Pdf
(565 KB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
table of contents
Baltimore, Maryland
SESSION: Research papers: stream aggregation
table of contents
Pages: 311 - 322
Year of Publication: 2005
ISBN:1-59593-060-4
|
|
Authors
|
|
Jin Li
|
Portland State University, Portland, OR
|
|
David Maier
|
Portland State University, Portland, OR
|
|
Kristin Tufte
|
Portland State University, Portland, OR
|
|
Vassilis Papadimos
|
Portland State University, Portland, OR
|
|
Peter A. Tucker
|
Whitworth College, Spokane, WA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 101, Citation Count: 10
|
|
|
ABSTRACT
A windowed query operator breaks a data stream into possibly overlapping subsets of data and computes a result over each. Many stream systems can evaluate window aggregate queries. However, current stream systems suffer from a lack of an explicit definition of window semantics. As a result, their implementations unnecessarily confuse window definition with physical stream properties. This confusion complicates the stream system, and even worse, can hurt performance both in terms of memory usage and execution time. To address this problem, we propose a framework for defining window semantics, which can be used to express almost all types of windows of which we are aware, and which is easily extensible to other types of windows that may occur in the future. Based on this definition, we explore a one-pass query evaluation strategy, the Window-ID (WID) approach, for various types of window aggregate queries. WID significantly reduces both required memory space and execution time for a large class of window definitions. In addition, WID can leverage punctuations to gracefully handle disorder. Our experimental study shows that WID has better execution-time performance than existing window aggregate query evaluation options that retain and reprocess tuples, and has better latency-accuracy tradeoffs for disordered input streams compared to using a fixed delay for handling disorder.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The Abilene Observatory. http://abilene.internet2.edu/observatory.
|
| |
2
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
[doi> 10.1007/s00778-003-0095-z]
|
| |
3
|
Arasu, A., Babu, S. and Widom, J. The CQL Continuous Query Language: Semantic Foundations and Query Execution. Stanford University Technical Report, October 2003.
|
 |
4
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
5
|
|
| |
6
|
Jim Gray , Surajit Chaudhuri , Adam Bosworth , Andrew Layman , Don Reichart , Murali Venkatrao , Frank Pellow , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery, v.1 n.1, p.29-53, 1997
[doi> 10.1023/A:1009726021843]
|
| |
7
|
Hammad, M., Aref, W., Franklin, M., Mokbel, M., and Elmagarmid, A. K. Efficient Execution of Sliding Window Queries over Data Streams. Purdue University Department of Computer Sciences Technical Report Number CSD TR 03-035, December 2003.
|
| |
8
|
Hammad, M., Franklin, M., Aref, W., and Elmagarmid, A. Scheduling for shared window joins over data streams. In Proceedings of the 29th International Conference on Very Large Databases (VLDB 2003) (September 2003, Berlin, Germany).
|
 |
9
|
|
| |
10
|
Naughton, J., DeWitt, D., Maier, D. et al. The Niagara Internet Query System. http://www.cs.wisc.edu/niagara.
|
| |
11
|
Passive Measurement and Analysis project. San Diego Supercomputer Center. http://pma.nlanr.net/PMA.
|
| |
12
|
Radiation Detection Center, Lawrence Livermore National Lab. http://rdc.llnl.gov.
|
| |
13
|
|
| |
14
|
Srivastava, U, Widom, J. Flexible Time Management in Data Stream Systems. Technical Report 2003-40, Stanford University, Stanford, CA (July 2003).
|
| |
15
|
Stanford Stream Query Repository. http://www-db.stanford.edu/stream/sqr.
|
| |
16
|
|
| |
17
|
|
| |
18
|
XMark Benchmark. http://www.xml-benchmark.org
|
CITED BY 10
|
|
|
|
|
|
|
|
|
|
|
Yijian Bai , Hetal Thakkar , Haixun Wang , Chang Luo , Carlo Zaniolo, A data stream language and system designed for power and extensibility, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
Lisha Ma , Werner Nutt , Hamish Taylor, Condensative stream query language for data streams, Proceedings of the eighteenth conference on Australasian database, p.113-122, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|