|
ABSTRACT
Data streaming systems are becoming essential for monitoring applications such as financial analysis and network intrusion detection. These systems often have to process many similar but different queries over common data. Since executing each query separately can lead to significant scalability and performance problems, it is vital to share resources by exploiting similarities in the queries. In this paper we present ways to efficiently share streaming aggregate queries with differing periodic windows and arbitrary selection predicates. A major contribution is our sharing technique that does not require any up-front multiple query optimization. This is a significant departure from existing techniques that rely on complex static analyses of fixed query workloads. Our approach is particularly vital in streaming systems where queries can join and leave the system at any point. We present a detailed performance study that evaluates our strategies with an implementation and real data. In these experiments, our approach gives us as much as an order of magnitude performance improvement over the state of the art.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Arasu et al.. Resource sharing in continuous sliding-window aggregates. In VLDB. 2004.
|
| |
2
|
|
| |
3
|
|
| |
4
|
D. Carney, et al.. Monitoring streams - a new class of data management applications. In VLDB. 2002.
|
| |
5
|
S. Chandrasekaran et al.. Streaming queries over streaming data. In VLDB. 2002.
|
| |
6
|
S. Chandrasekaran, et al.. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR. 2003.
|
 |
7
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
 |
8
|
|
 |
9
|
|
 |
10
|
Prasad M. Deshpande , Karthikeyan Ramasamy , Amit Shukla , Jeffrey F. Naughton, Caching multidimensional queries using chunks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.259-270, June 01-04, 1998, Seattle, Washington, United States
|
| |
11
|
C. L. Forgy. Rete: A fast algorithm for the many pattern/many object match problem. Artifical Intelligence, 19(1):17--37, September 1982.
|
| |
12
|
M. J. Franklin, et al.. Design considerations for high fan-in systems: The HiFi approach. In CIDR. 2005.
|
 |
13
|
|
 |
14
|
|
| |
15
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
| |
16
|
M. A. Hammad, et al.. Efficient pipelined execution of sliding window queries over data streams. Technical Report CSD TR#03-035, Purdue, 2003.
|
| |
17
|
M. A. Hammad, et al.. Scheduling for shared window joins over data streams. In vldb. 2003.
|
 |
18
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
19
|
M. Jarke. Common subexpression isolation in multiple query optimization. In Query Processing in Database Systems. Springer Verlag, 1985.
|
| |
20
|
S. Krishnamurthy, et al.. TelegraphCQ: An architectural status report. IEEE DE. Bull., 26(1), 2003.
|
| |
21
|
S. Krishnamurthy, et al.. The case for precision sharing. In VLDB. 2004.
|
 |
22
|
|
 |
23
|
|
 |
24
|
|
| |
25
|
R. Motwani, et al.. Query processing, resource management, and approximation in a data stream management system. In CIDR. 2003.
|
| |
26
|
NASDAQ. NASTRAQ: North American Securities Tracking and Quantifying System. http://www.nastraq.com/description.htm.
|
| |
27
|
NYSE. NYSE TAQ: Daily Trades and Quotes Database. http://www.nysedata.com/info/productdetail.asp?dpbid=13.
|
 |
28
|
Prasan Roy , S. Seshadri , S. Sudarshan , Siddhesh Bhobe, Efficient and extensible algorithms for multi query optimization, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.249-260, May 15-18, 2000, Dallas, Texas, United States
|
 |
29
|
|
 |
30
|
|
| |
31
|
N. Tatbul, et al.. Load shedding in a data stream manager. In VLDB. 2003.
|
CITED BY 6
|
|
|
|
|
|
|
|
|
|
|
Ka Cheung Sia , Junghoo Cho , Yun Chi , Belle L. Tseng, Efficient computation of personal aggregate queries on blogs, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
Mingsheng Hong , Mirek Riedewald , Christoph Koch , Johannes Gehrke , Alan Demers, Rule-based multi-query optimization, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|