|
ABSTRACT
Multi-way stream joins with expensive join predicates lead to great challenge for real-time (or close to real-time) stream processing. Given the memory- and CPU-intensive nature of such stream join queries, scalable processing on a cluster must be employed. This paper proposes a novel scheme for distributed processing of generic multi-way joins with window constraints, called Pipelined State Partitioning (PSP). We target generic joins with arbitrarily join conditions, which are used in non-trivial stream applications such as image matching and biometric recognizing. The PSP scheme partitions the states into disjoint slices in the time domain, and then distributes the fine-grained states in the cluster, forming a virtual computation ring. Compared to replication-based distribution of non-equi-joins, PSP scheme is superior since: (1) zero state duplication and thus no repeated computations, (2) pipelined processing of every input tuple on multiple nodes to achieve low response time, and (3) cost-based adaptive workload distribution. We have implemented the proposed PSP schemes within the CAPE DSMS. Our experimental study demonstrates the significant performance improvements compared to the state-of-the-art generic distributed stream join algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Yanif Ahmad , Bradley Berg , Uǧur Cetintemel , Mark Humphrey , Jeong-Hyon Hwang , Anjali Jhingran , Anurag Maskey , Olga Papaemmanouil , Alexander Rasin , Nesime Tatbul , Wenjuan Xing , Ying Xing , Stan Zdonik, Distributed operation in the Borealis stream processing engine, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
[doi> 10.1145/1066157.1066274]
|
| |
2
|
|
| |
3
|
M. H. Ali , W. G. Aref , R. Bose , A. K. Elmagarmid , A. Helal , I. Kamel , M. F. Mokbel, Nile-PDT: a phenomenon detection and tracking framework for data stream management systems, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
 |
4
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
5
|
Shivnath Babu , Rajeev Motwani , Kamesh Munagala , Itaru Nishizawa , Jennifer Widom, Adaptive ordering of pipelined stream filters, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007615]
|
| |
6
|
|
 |
7
|
|
 |
8
|
Sumit Ganguly , Waqar Hasan , Ravi Krishnamurthy, Query optimization for parallel execution, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.9-18, June 02-05, 1992, San Diego, California, United States
|
| |
9
|
X. Gu, P. S. Yu, and H. Wang. Adaptive load diffusion for multiway windowed stream joins. In ICDE, pages 146--155, 2007.
|
 |
10
|
Navendu Jain , Lisa Amini , Henrique Andrade , Richard King , Yoonho Park , Philippe Selo , Chitra Venkatramani, Design, implementation, and evaluation of the linear road bnchmark on the stream processing core, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142522]
|
 |
11
|
|
 |
12
|
|
 |
13
|
Sailesh Kumar , Sarang Dharmapurikar , Fang Yu , Patrick Crowley , Jonathan Turner, Algorithms to accelerate multiple regular expressions matching for deep packet inspection, Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, September 11-15, 2006, Pisa, Italy
|
| |
14
|
Bin Liu , Yali Zhu , Mariana Jbantova , Bradley Momberger , Elke A. Rundensteiner, A dynamically adaptive distributed system for processing complex continuous queries, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
15
|
V. Raghavan, E. A. Rundensteiner, J. P. Woycheese, and A. Mukherji. Firestream: Sensor stream processing for monitoring fire. In ICDE, 2007.
|
| |
16
|
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin. Flux: An adaptive partitioning operator for continuous query systmes. In Proceedings of ICDE, pages 25--36, 2003.
|
| |
17
|
|
| |
18
|
Nesime Tatbul , Uğur Çetintemel , Stan Zdonik , Mitch Cherniack , Michael Stonebraker, Load shedding in a data stream manager, Proceedings of the 29th international conference on Very large data bases, p.309-320, September 09-12, 2003, Berlin, Germany
|
| |
19
|
|
| |
20
|
T. Urhan and M. Franklin. XJoin: A reactively scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2):27--33, 2000.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
|