|
ABSTRACT
This paper studies the problem of evaluating continuous multi-way joins on top of Distributed Hash Tables (DHTs). We present a novel algorithm, called recursive join (RJoin), that takes into account various parameters crucial in a distributed setting i.e., network traffic, query processing load distribution, storage load distribution etc. The key idea of RJoin is incremental evaluation: as relevant tuples arrive continuously, a given multi-way join is rewritten continuously into a join with fewer join operators, and is assigned continuously to different nodes of the network. In this way, RJoin distributes the responsibility of evaluating a continuous multi-way join to many network nodes by assigning parts of the evaluation of each binary join to a different node depending on the values of the join attributes. The actual nodes to be involved are decided by RJoin dynamically after taking into account the rate of incoming tuples with values equal to the values of the joined attributes. RJoin also supports sliding window joins which is a crucial feature, especially for long join paths, since it provides a mechanism to reduce the query processing state and thus keep the cost of handling incoming tuples stable. In addition, RJoin is able to handle message delays due to heavy network traffic. We present a detailed mathematical and experimental analysis of RJoin and study the performance tradeoffs that occur.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. Abadi et al. The Design of the Borealis Stream Processing Engine. CIDR '05.
|
| |
3
|
Karl Aberer , Luc Onana Alima , Ali Ghodsi , Sarunas Girdzijauskas , Seif Haridi , Manfred Hauswirth, The Essence of P2P: A Reference Architecture for Overlay Networks, Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing, p.11-20, August 31-September 02, 2005
[doi> 10.1109/P2P.2005.38]
|
| |
4
|
|
| |
5
|
Y. Ahmad et al. Locality-Aware Networked Join Evaluation. NETDB '05.
|
 |
6
|
|
| |
7
|
Don Carney , Uǧur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Greg Seidman , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Monitoring streams: a new class of data management applications, Proceedings of the 28th international conference on Very Large Data Bases, p.215-226, August 20-23, 2002, Hong Kong, China
|
| |
8
|
|
 |
9
|
|
| |
10
|
M. Cherniack et al. Scalable Distributed Stream Processing. CIDR '03.
|
| |
11
|
|
| |
12
|
|
| |
13
|
J. M. Hellerstein et al. A Wakeup Call for Internet Monitoring Systems: The Case for Distributed Triggers. HotNets-III '04.
|
| |
14
|
|
| |
15
|
Ryan Huebsch , Joseph M. Hellerstein , Nick Lanham , Boon Thau Loo , Scott Shenker , Ion Stoica, Querying the internet with PIER, Proceedings of the 29th international conference on Very large data bases, p.321-332, September 09-12, 2003, Berlin, Germany
|
| |
16
|
R. Huebsch et al. The Architecture of PIER: an Internet-Scale Query Processor. CIDR '05.
|
| |
17
|
S. Idreos. Distributed Evaluation of Continuous Equi-join Queries over Large Structured Overlay Networks. Master thesis. Technical University of Crete. September, 2005.
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
S. Chandrasekharan et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. CIDR '03.
|
| |
24
|
J. Shneidman et al. A Cost-Space Approach to Distributed Query Optimization in Stream Based Overlays. NETDB '05.
|
 |
25
|
|
 |
26
|
|
| |
27
|
B.-Y. Zhao et al. Tapestry: A Resilient Global-scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications, 22(1), 2004.
|
|