ACM Home Page
Please provide us with feedback. Feedback
Exploiting the power of relational databases for efficient stream processing
Full text PdfPdf (807 KB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
SESSION: Research sessions: Stream processing table of contents
Pages 323-334  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
Erietta Liarou  CWI Amsterdam, The Netherlands
Romulo Goncalves  CWI Amsterdam, The Netherlands
Stratos Idreos  CWI Amsterdam, The Netherlands
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 114,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516398
What is a DOI?

ABSTRACT

Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research.

In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after x tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions.

We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. J. Abadi et al. The Design of the Borealis Stream Processing Engine. In CIDR, 2005.
 
2
A. Arasu et al. CQL: A Language for Continuous Queries over Streams and Relations. In DBPL, 2003.
 
3
 
4
5
 
6
 
7
S. Chandrasekaran et al. TelegraphCQ: Continuous Data-flow Processing for an Uncertain World. In CIDR, 2003.
8
9
 
10
L. Girod et al. The Case for a Signal-Oriented Data Stream Management System. In CIDR, 2007.
11
 
12
13
 
14
M. Kersten, E. Liarou, and R. Goncalves. A Query Language for a Data Refinery Cell. In Int. Workshop on Event Driven Architecture and Event Processing Systems, 2007.
15
16
 
17
MonetDB. http://www.monetdb.com.
18
 
19
 
20
StreamSQL. http://blogs.streamsql.org/.
Collaborative Colleagues:
Erietta Liarou: colleagues
Romulo Goncalves: colleagues
Stratos Idreos: colleagues