|
ABSTRACT
Monitoring aggregates on IP traffic data streams is a compelling application for data stream management systems. The need for exploratory IP traffic data analysis naturally leads to posing related aggregation queries on data streams, that differ only in the choice of grouping attributes. In this paper, we address this problem of efficiently computing multiple aggregations over high speed data streams, based on a two-level LFTA/HFTA DSMS architecture, inspired by Gigascope.Our first contribution is the insight that in such a scenario, additionally computing and maintaining fine-granularity aggregation queries (phantoms) at the LFTA has the benefit of supporting shared computation. Our second contribution is an investigation into the problem of identifying beneficial LFTA configurations of phantoms and user-queries. We formulate this problem as a cost optimization problem, which consists of two sub-optimization problems: how to choose phantoms and how to allocate space for them in the LFTA. We formally show the hardness of determining the optimal configuration, and propose cost greedy heuristics for these independent sub-problems based on detailed analyses. Our final contribution is a thorough experimental study, based on real IP traffic data, as well as synthetic data, to demonstrate the effectiveness of our techniques for identifying beneficial configurations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Arasu, et al. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26(1):19--26, 2003.
|
| |
2
|
A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In VLDB, 2004.
|
 |
3
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
4
|
D. Carney, et al. Monitoring streams - a new class of data management applications. In VLDB, 2002.
|
| |
5
|
S. Chandrasekaran, et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, 2003.
|
| |
6
|
S. Chandrasekaran and M. J. Franklin. Streaming queries over streaming data. In VLDB, 2002.
|
 |
7
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
 |
8
|
|
| |
9
|
A. Dobra, M. N. Garofalakis, J. Gehrke, and R. Rastogi. Sketch-based multi-query processing over data streams. In EDBT, 2004.
|
| |
10
|
M. Dwass. Probability and statistics: an undergraduate course. W. A. Benjamin, 1970.
|
 |
11
|
Françoise Fabret , H. Arno Jacobsen , François Llirbat , Joăo Pereira , Kenneth A. Ross , Dennis Shasha, Filtering algorithms and implementation for very fast publish/subscribe systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.115-126, May 21-24, 2001, Santa Barbara, California, United States
|
| |
12
|
W. Feller. An introduction to probability theory and its applications, volume I. John Wiley & Sons, Inc, 1968.
|
| |
13
|
A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques and applications. IEEE Data Engineering Bulletin, 18(2), June 1995. Special Issue on Materialized Views and Data Warehousing.
|
 |
14
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
15
|
N. Koudas and D. Srivastava. Data stream query processing: A tutorial. In VLDB, 2003.
|
 |
16
|
|
 |
17
|
Kenneth A. Ross , Divesh Srivastava , S. Sudarshan, Materialized view maintenance and integrity constraint checking: trading space for time, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.447-458, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
18
|
|
| |
19
|
M. Sullivan and A. Heybey. Tribeca: A system for managing large databases of network traffic. In USENIX, 1998.
|
CITED BY 7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mingsheng Hong , Mirek Riedewald , Christoph Koch , Johannes Gehrke , Alan Demers, Rule-based multi-query optimization, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|