ACM Home Page
Please provide us with feedback. Feedback
Scalable stream join processing with expensive predicates: workload distribution and adaptation by time-slicing
Full text PdfPdf (767 KB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
SESSION: Research sessions: Stream processing table of contents
Pages 299-310  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
Song Wang  Hewlett-Packard Laboratories
Elke Rundensteiner  Worcester Polytechnic Institute, Worcester, MA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 67,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516396
What is a DOI?

ABSTRACT

Multi-way stream joins with expensive join predicates lead to great challenge for real-time (or close to real-time) stream processing. Given the memory- and CPU-intensive nature of such stream join queries, scalable processing on a cluster must be employed. This paper proposes a novel scheme for distributed processing of generic multi-way joins with window constraints, called Pipelined State Partitioning (PSP). We target generic joins with arbitrarily join conditions, which are used in non-trivial stream applications such as image matching and biometric recognizing. The PSP scheme partitions the states into disjoint slices in the time domain, and then distributes the fine-grained states in the cluster, forming a virtual computation ring. Compared to replication-based distribution of non-equi-joins, PSP scheme is superior since: (1) zero state duplication and thus no repeated computations, (2) pipelined processing of every input tuple on multiple nodes to achieve low response time, and (3) cost-based adaptive workload distribution. We have implemented the proposed PSP schemes within the CAPE DSMS. Our experimental study demonstrates the significant performance improvements compared to the state-of-the-art generic distributed stream join algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
5
 
6
7
8
 
9
X. Gu, P. S. Yu, and H. Wang. Adaptive load diffusion for multiway windowed stream joins. In ICDE, pages 146--155, 2007.
10
11
12
13
 
14
 
15
V. Raghavan, E. A. Rundensteiner, J. P. Woycheese, and A. Mukherji. Firestream: Sensor stream processing for monitoring fire. In ICDE, 2007.
 
16
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin. Flux: An adaptive partitioning operator for continuous query systmes. In Proceedings of ICDE, pages 25--36, 2003.
 
17
 
18
 
19
 
20
T. Urhan and M. Franklin. XJoin: A reactively scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2):27--33, 2000.
 
21
 
22
 
23
Collaborative Colleagues:
Song Wang: colleagues
Elke Rundensteiner: colleagues