ACM Home Page
Please provide us with feedback. Feedback
Replay-based approaches to revision processing in stream query engines
Full text PdfPdf (434 KB)
Source SSPS; Vol. 301 archive
Proceedings of the 2nd international workshop on Scalable stream processing system table of contents
Nantes, France
SESSION: Adaptation, load balancing, and load shedding table of contents
Pages 3-12  
Year of Publication: 2008
ISBN:978-159593-963-0
Authors
Anurag S. Maskey  Brandeis University, Waltham, MA
Mitch Cherniack  Brandeis University, Waltham, MA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 42,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1379272.1379276
What is a DOI?

ABSTRACT

Data stream processing systems have become ubiquitous in academic and commercial sectors, with application areas that include financial services, network traffic analysis, battlefield monitoring and traffic control. The append-only model of streams implies that input data is immutable and therefore always correct. But in practice, streaming data sources often contend with noise (e.g., embedded sensors) or data entry errors (e.g., financial data feeds) resulting in erroneous inputs and by implication, erroneous query results. Many data stream sources (e.g., Reuters ticker feeds) issue "revision tuples" (revisions) that amend previously issued tuples (e.g. erroneous share prices). A stream processing engine might reasonably respond to revision inputs by generating revision outputs that correct previously emitted query results. We know of no stream processing system that presently has this capability.

In this paper, we describe how a stream processing engine can be extended to support revision processing via replay. Replay-based revision processing techniques assume that a stream engine maintains an archive of recent data seen on each of its input streams. These archives are then queried in response to a revision, with the resulting tuples replayed through the system so as to generate corrected query outputs. We first present the design and implementation of the revision processing engine for the Borealis stream processing engine [1]. We then compare techniques for archiving streams to support replay, and then compare the performance and overhead of two revision processing techniques that replay input tuples to recompute and thereby revise previously output query results. These experiments reveal scalability issues due to the overhead required to maintain stream archives, and has motivated our current research on using sampling and data summarization (e.g., histograms) to reduce the data that must be stored in a stream archive.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), Asilomar, CA, January 2005.
2
 
3
4
 
5
S. Chandrasekaran, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In CIDR Conference, January 2003.
 
6
7
 
8
T. M. Ghanem, M. A. Hammad, M. F. Mokbel, W. G. Aref, and A. K. Elmagarmid. Query Processing using Negative Tuples in Stream Query Engines. Technical Report CSD 04-040, Purdue University, 2005.
9
10
 
11
 
12
C. S. Jensen. Temporal Database Management. PhD thesis, Aalborg University, 2000.
 
13
A. S. Maskey and M. Cherniack. Replay-Based Approaches to Revision Processing in Stream Query Engines. Technical report, Brandeis University, December 2007. URL: http://www.cs.brandeis.edu/%7Eanurag/revision-techreport- 07.pdf.
 
14
PostgreSQL Weekly News - June 17 2007, URL: http://people.planetpostgresql.org/dfetter/index.php?/archives/123-PostgreSQL-Weekly-News-June-17-2007.html.
 
15
16
 
17
 
18
StreamBase Systems, Inc. URL: http://www.streambase.com/.
 
19
J. Widom. Trio: A System for Integrated Management of Data, Accuracy, and Lineage. In CIDR Conference, pages 262--276, January 2005.

Collaborative Colleagues:
Anurag S. Maskey: colleagues
Mitch Cherniack: colleagues