| Query-aware partitioning for monitoring massive network data streams |
| Full text |
Pdf
(429 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
table of contents
Vancouver, Canada
SESSION: Industrial Session 3: Streams, Conversations and Verification:
table of contents
Pages 1135-1146
Year of Publication: 2008
ISBN:978-1-60558-102-6
|
|
Authors
|
|
Theodore Johnson
|
AT&T Labs - Research, Florham Park, NJ, USA
|
|
Muthu S. Muthukrishnan
|
Rutgers University, Piscataway, NJ, USA
|
|
Vladislav Shkapenyuk
|
AT&T Labs - Research, Florham Park, NJ, USA
|
|
Oliver Spatscheck
|
AT&T Labs - Research, Florham Park, NJ, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 31, Downloads (12 Months): 257, Citation Count: 2
|
|
|
ABSTRACT
Data Stream Management Systems (DSMS) are gaining acceptance for applications that need to process very large volumes of data in real time. The load generated by such applications frequently exceeds by far the computation capabilities of a single centralized server. In particular, a single-server instance of our DSMS, Gigascope, cannot keep up with the processing demands of the new OC-786 networks, which can generate more than 100 million packets per second. In this paper, we explore a mechanism for the distributed processing of very high speed data streams. Existing distributed DSMSs employ two mechanisms for distributing the load across the participating machines: partitioning of the query execution plans and partitioning of the input data stream in a query-independent fashion. However, for a large class of queries, both approaches fail to reduce the load as compared to centralized system, and can even lead to an increase in the load. In this paper we present an alternative approach - query-aware data stream partitioning that allows for more efficient scaling. We present methods for analyzing any given query set and choose the optimal partitioning scheme, and show how to reconcile potentially conflicting requirements that different queries might place on partitioning. We conclude with experiments on a small cluster of processing nodes on high-rate network traffic feed that demonstrates with different query sets that our methods effectively distribute the load across all processing nodes and facilitate efficient scaling whenever more processing nodes becomes available.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. J. Abadi et al. The Design of the Borealis Stream Processing Engine, CIDR 2005.
|
| |
2
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
[doi> 10.1007/s00778-003-0095-z]
|
| |
3
|
Daniel J. Abadi , Wolfgang Lindner , Samuel Madden , Jörg Schuler, An integration framework for sensor networks and data stream management systems, Proceedings of the Thirtieth international conference on Very large data bases, p.1361-1364, August 31-September 03, 2004, Toronto, Canada
|
| |
4
|
A. Arasu et al. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26(1):19--26, 2003.
|
 |
5
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
6
|
|
| |
7
|
S. Chandrasekaran et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. CIDR 2003.
|
 |
8
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
| |
9
|
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. Zdonik. Scalable Distributed Stream Processing. CIDR 2003.
|
 |
10
|
Graham Cormode , Theodore Johnson , Flip Korn , S. Muthukrishnan , Oliver Spatscheck , Divesh Srivastava, Holistic UDAFs at streaming speeds, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007575]
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. ICDE 2003
|
| |
21
|
|
CITED BY 2
|
|
|
|
|
Padmanabhan S. Pillai , Lily B. Mummert , Steven W. Schlosser , Rahul Sukthankar , Casey J. Helfrich, SLIPstream: scalable low-latency interactive perception on streaming data, Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video, June 03-05, 2009, Williamsburg, VA, USA
|
|