ACM Home Page
Please provide us with feedback. Feedback
Highly available, fault-tolerant, parallel dataflows
Full text PdfPdf (210 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: consistency and availability table of contents
Pages: 827 - 838  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Mehul A. Shah  U.C. Berkeley
Joseph M. Hellerstein  U.C. Berkeley, Intel Research, Berkeley
Eric Brewer  U.C. Berkeley
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 136,   Citation Count: 15
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007662
What is a DOI?

ABSTRACT

We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to implement a variety of continuous query (CQ) applications that require high-throughput, 24x7 operation. Examples include network monitoring, phone call processing, click-stream processing, and online financial analysis. Our main contribution is a scheme that carefully integrates traditional query processing techniques for partitioned parallelism with the process-pairs approach for high availability. This delicate integration allows us to tolerate failures of portions of a parallel dataflow without sacrificing result quality. Upon failure, our technique provides quick fail-over, and automatically recovers the lost pieces on the fly. This piecemeal recovery provides minimal disruption to the ongoing dataflow computation and improved reliability as compared to the straight-forward application of the process-pairs technique on a per dataflow basis. Thus, our technique provides the high availability necessary for critical CQ applications. Our techniques are encapsulated in a reusable dataflow operator called Flux, an extension of the Exchange that is used to compose parallel dataflows. Encapsulating the fault-tolerance logic into Flux minimizes modifications to existing operator code and relieves the burden on the operator writer of repeatedly implementing and verifying this critical logic. We present experiments illustrating these features with an implementation of Flux in the TelegraphCQ code base [8].


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Y. Amir and J. Stanton. The Spread Wide Area Group Communication System. Technical Report CNDS-98-4, Johns Hopkins, 1998.
 
2
 
3
J. Baulier, S. Blott, H. Korth, and A. Silberschatz. A Database System for Real-Time Event Aggregation in Telecommunication. 1998.
4
 
5
 
6
 
7
D. Carny et al. Monitoring Streams - A New Class of Data Management Applications. In VLDB, 2002.
 
8
S. Chandrasekaran et al. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In CIDR, 2003.
9
10
 
11
E. Elnozahy, D. Johnson, and Y. Wang. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. Technical Report CMU-CS-96-181, CMU, 1996.
12
13
 
14
G. Graefe. Query Evaluation Techniques for Large Databases. In ACM Computing Surveys, 2002.
 
15
 
16
 
17
S. Hvasshovd et al. The ClustRa Telecom Database. In VLDB, 1995.
 
18
J. Hwang et al. A Comparison of Stream-Oriented High-Availability Algorithms. Technical Report CS-03--17, Brown, 2003.
 
19
B. Kemme and G. Alonso. Don't Be Lazy, Be Consistent. In VLDB, 2000.
 
20
 
21
L. Lamport. The Implementation of Reliable Distributed Multiprocess Systems. Computer Networks, 1978.
22
23
 
24
S. Madden and M. Franklin. Fjording the Stream: An Architecture for Queries over Streaming Sensor Data. In ICDE, 2002.
 
25
 
26
R. Motwani et al. Query Processing, Approximation, and Resource Management in a Data Stream Management System. In CIDR, 2003.
 
27
28
29
 
30
M. Shah, J. Hellerstein, S. Chandrasekaran, and M. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. In ICDE, 2003.
 
31
T. Urhan and M. Franklin. XJoin: A Reactively-Scheduled Pipelined Join Operator. IEEE Data Engineering Bulletin, June 2000.
 
32
 
33

CITED BY  15
Collaborative Colleagues:
Mehul A. Shah: colleagues
Joseph M. Hellerstein: colleagues
Eric Brewer: colleagues