ACM Home Page
Please provide us with feedback. Feedback
Query-aware partitioning for monitoring massive network data streams
Full text PdfPdf (429 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Industrial Session 3: Streams, Conversations and Verification: table of contents
Pages 1135-1146  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Theodore Johnson  AT&T Labs - Research, Florham Park, NJ, USA
Muthu S. Muthukrishnan  Rutgers University, Piscataway, NJ, USA
Vladislav Shkapenyuk  AT&T Labs - Research, Florham Park, NJ, USA
Oliver Spatscheck  AT&T Labs - Research, Florham Park, NJ, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 31,   Downloads (12 Months): 257,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376730
What is a DOI?

ABSTRACT

Data Stream Management Systems (DSMS) are gaining acceptance for applications that need to process very large volumes of data in real time. The load generated by such applications frequently exceeds by far the computation capabilities of a single centralized server. In particular, a single-server instance of our DSMS, Gigascope, cannot keep up with the processing demands of the new OC-786 networks, which can generate more than 100 million packets per second. In this paper, we explore a mechanism for the distributed processing of very high speed data streams.

Existing distributed DSMSs employ two mechanisms for distributing the load across the participating machines: partitioning of the query execution plans and partitioning of the input data stream in a query-independent fashion. However, for a large class of queries, both approaches fail to reduce the load as compared to centralized system, and can even lead to an increase in the load. In this paper we present an alternative approach - query-aware data stream partitioning that allows for more efficient scaling. We present methods for analyzing any given query set and choose the optimal partitioning scheme, and show how to reconcile potentially conflicting requirements that different queries might place on partitioning. We conclude with experiments on a small cluster of processing nodes on high-rate network traffic feed that demonstrates with different query sets that our methods effectively distribute the load across all processing nodes and facilitate efficient scaling whenever more processing nodes becomes available.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. J. Abadi et al. The Design of the Borealis Stream Processing Engine, CIDR 2005.
 
2
 
3
 
4
A. Arasu et al. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26(1):19--26, 2003.
5
 
6
 
7
S. Chandrasekaran et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. CIDR 2003.
8
 
9
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. Zdonik. Scalable Distributed Stream Processing. CIDR 2003.
10
11
12
 
13
14
15
 
16
17
 
18
19
 
20
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. ICDE 2003
 
21


Collaborative Colleagues:
Theodore Johnson: colleagues
Muthu S. Muthukrishnan: colleagues
Vladislav Shkapenyuk: colleagues
Oliver Spatscheck: colleagues