|
ABSTRACT
Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research. In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after x tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. J. Abadi et al. The Design of the Borealis Stream Processing Engine. In CIDR, 2005.
|
| |
2
|
A. Arasu et al. CQL: A Language for Continuous Queries over Streams and Relations. In DBPL, 2003.
|
| |
3
|
Arvind Arasu , Mitch Cherniack , Eduardo Galvez , David Maier , Anurag S. Maskey , Esther Ryvkina , Michael Stonebraker , Richard Tibbetts, Linear road: a stream data management benchmark, Proceedings of the Thirtieth international conference on Very large data bases, p.480-491, August 31-September 03, 2004, Toronto, Canada
|
| |
4
|
|
 |
5
|
|
| |
6
|
Hari Balakrishnan , Magdalena Balazinska , Don Carney , Uğur Çetintemel , Mitch Cherniack , Christian Convey , Eddie Galvez , Jon Salz , Michael Stonebraker , Nesime Tatbul , Richard Tibbetts , Stan Zdonik, Retrospective on Aurora, The VLDB Journal — The International Journal on Very Large Data Bases, v.13 n.4, p.370-383, December 2004
[doi> 10.1007/s00778-004-0133-5]
|
| |
7
|
S. Chandrasekaran et al. TelegraphCQ: Continuous Data-flow Processing for an Uncertain World. In CIDR, 2003.
|
 |
8
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
 |
9
|
|
| |
10
|
L. Girod et al. The Case for a Signal-Oriented Data Stream Management System. In CIDR, 2007.
|
 |
11
|
|
| |
12
|
|
 |
13
|
Navendu Jain , Lisa Amini , Henrique Andrade , Richard King , Yoonho Park , Philippe Selo , Chitra Venkatramani, Design, implementation, and evaluation of the linear road bnchmark on the stream processing core, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142522]
|
| |
14
|
M. Kersten, E. Liarou, and R. Goncalves. A Query Language for a Data Refinery Cell. In Int. Workshop on Event Driven Architecture and Event Processing Systems, 2007.
|
 |
15
|
Hyo-Sang Lim , Jae-Gil Lee , Min-Jae Lee , Kyu-Young Whang , Il-Yeol Song, Continuous query processing in data streams using duality of data and queries, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142509]
|
 |
16
|
|
| |
17
|
MonetDB. http://www.monetdb.com.
|
 |
18
|
|
| |
19
|
|
| |
20
|
StreamSQL. http://blogs.streamsql.org/.
|
|