|
ABSTRACT
Join algorithms must be re-designed when processing stream data instead of persistently stored data. Data streams are potentially infinite and the query result is expected to be generated incrementally instead of once only. Data arrival patterns are often unpredictable and the statistics of the data and other relevant metadata often are only known at runtime. In some cases they are supplied interleaved with the actual data in the form of stream markers. Recently, stream join algorithms, like Symmetric Hash Join and XJoin, have been designed to perform in a pipelined fashion to cope with the latent delivery of data. However, none of them to date takes metadata, especially runtime metadata, into consideration. Hence, the join execution logic defined statically before runtime may not be well suited to deal with varying types of dynamic runtime scenarios. Also the potentially unbounded state needs to be maintained by the join operator to guarantee the precision of the result. In this paper, we propose a metadata-aware stream join operator called MJoin which is able to exploit metadata to (1) detect and purge useless materialized data to save computation resources and (2) optimize the execution logic to target diferent optimization goals. We have implemented the MJoin operator. The experimental results validate our metadata-driven join optimization strategies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Babu and J. Widom. Exploiting k-Constraints to reduce memory overhead in continuous queries over data streams. Technical report, Stanford University, Nov 2002.
|
| |
2
|
D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams - a new class of data management applications. In VLDB, pages 215--226, 2002.
|
| |
3
|
S. Chandrasekaran and M. Franklin. Streaming queries over streaming data. In VLDB, pages 203--214, 2002.
|
 |
4
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
| |
5
|
Qi Cheng , Jarek Gryz , Fred Koo , T. Y. Cliff Leung , Linqi Liu , Xiaoyan Qian , K. Bernhard Schiefer, Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database, Proceedings of the 25th International Conference on Very Large Data Bases, p.687-698, September 07-10, 1999
|
| |
6
|
|
| |
7
|
Z. Ives, A. Levy, and D. Weld. Efficient evaluation of regular path expressions on streaming XML data. Technical Report CSE000502, University of Washington.
|
 |
8
|
Zachary G. Ives , Daniela Florescu , Marc Friedman , Alon Levy , Daniel S. Weld, An adaptive query execution system for data integration, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.299-310, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
9
|
J. J. King. Quist: A system for semantic query optimization in relational databases. In VLDB, pages 510--517. IEEE Computer Society, 1981.
|
| |
10
|
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, resource management, and approximation in a data stream management system. In Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR 2003), 2003.
|
| |
11
|
H. Su, B. Pielech, L. Ding, J. Jian, Y. Zhu, and E. A. Rundensteiner. Raindrop: A uniform query paradigm for processing xqueries on XML streams. Submitted for publication, 2003.
|
| |
12
|
P. Tucker, D. Maier, T. Sheard, and L. Fegaras. Punctuating continuous data streams. www.cse.ogi.edu/dot/niagara/pstream/punctuating.pdf, 2002.
|
| |
13
|
T. Urhan and M. Franklin. XJoin: A reactively scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2):27--33, 2000.
|
| |
14
|
|
CITED BY 3
|
|
|
|
|
Elke A. Rundensteiner , Luping Ding , Timothy Sutherland , Yali Zhu , Brad Pielech , Nishant Mehta, CAPE: continuous query engine with heterogeneous-grained adaptivity, Proceedings of the Thirtieth international conference on Very large data bases, p.1353-1356, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|