|
ABSTRACT
In many applications involving continuous data streams, data arrival is bursty and data rate fluctuates over time. Systems that seek to give rapid or real-time query responses in such an environment must be prepared to deal gracefully with bursts in data arrival without compromising system performance. We discuss one strategy for processing bursty streams --- adaptive, load-aware scheduling of query operators to minimize resource consumption during times of peak load. We show that the choice of an operator scheduling strategy can have significant impact on the run-time system memory usage. We then present Chain scheduling, an operator scheduling strategy for data stream systems that is near-optimal in minimizing run-time memory usage for any collection of single-stream queries involving selections, projections, and foreign-key joins with stored relations. Chain scheduling also performs well for queries with sliding-window joins over multiple streams, and multiple queries of the above types. A thorough experimental evaluation is provided where we demonstrate the potential benefits of Chain scheduling, compare it with competing scheduling strategies, and validate our analytical conclusions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
4
|
Luc Bouganim , Olga Kapitskaia , Patrick Valduriez, Memory-adaptive scheduling for large query execution, Proceedings of the seventh international conference on Information and knowledge management, p.105-115, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288646]
|
| |
5
|
D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams--a new class of data management applications. In Proc. 28th Intl. Conf. on Very Large Data Bases, Aug. 2002.
|
| |
6
|
S. Chandrasekaran and M. Franklin. Streaming queries over streaming data. In Proc. 28th Intl. Conf. on Very Large Data Bases, Aug. 2002.
|
 |
7
|
Corinna Cortes , Kathleen Fisher , Daryl Pregibon , Anne Rogers, Hancock: a language for extracting signatures from data streams, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.9-17, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347094]
|
| |
8
|
B. Dageville and M. Zait. SQL memory management in Oracle9i. In Proc. of the 2002 Intl. Conf. on Very Large Data Bases, Aug. 2002.
|
| |
9
|
|
| |
10
|
J. Hellerstein, M. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. A. Shah. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7--18, June 2000.
|
| |
11
|
Internet Traffic Archive. http://www.acm.org/sigcomm/ITA/.
|
 |
12
|
Zachary G. Ives , Daniela Florescu , Marc Friedman , Alon Levy , Daniel S. Weld, An adaptive query execution system for data integration, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.299-310, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
13
|
|
| |
14
|
J. Kang, J. F. Naughton, and S. Viglas. Evaluating window joins over unbounded streams. In Proc. of the 2003 Intl. Conf. on Data Engineering, Mar. 2003.
|
 |
15
|
|
| |
16
|
|
| |
17
|
D. Lomet and A. Levy. Special issue on adaptive query processing. IEEE Data Engineering Bulletin, 23(2), June 2000.
|
 |
18
|
|
| |
19
|
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, approximation, and resource management in a data stream management system. In Proc. First Biennial Conf. on Innovative Data Systems Research (CIDR), Jan. 2003.
|
 |
20
|
|
| |
21
|
Niagara Project. http://www.cs.wisc.edu/niagara/.
|
| |
22
|
|
| |
23
|
Douglas Stott Parker, Jr. , Eric Simon , Patrick Valduriez, SVP: A Model Capturing Sets, Lists, Streams, and Parallelism, Proceedings of the 18th International Conference on Very Large Data Bases, p.115-126, August 23-27, 1992
|
| |
24
|
V. Raman, A. Deshpande, and J. Hellerstein. Using state modules for adaptive query processing. In Proc. of the 2003 Intl. Conf. on Data Engineering, Mar. 2003.
|
 |
25
|
|
| |
26
|
Stanford Stream Data Management (STREAM) Project. http://www-db.stanford.edu/stream.
|
| |
27
|
|
 |
28
|
Douglas Terry , David Goldberg , David Nichols , Brian Oki, Continuous queries over append-only databases, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.321-330, June 02-05, 1992, San Diego, California, United States
|
| |
29
|
T. Urhan and M. Franklin. Xjoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2):27--33, June 2000.
|
| |
30
|
|
 |
31
|
Tolga Urhan , Michael J. Franklin , Laurent Amsaleg, Cost-based query scrambling for initial delays, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.130-141, June 01-04, 1998, Seattle, Washington, United States
|
 |
32
|
|
| |
33
|
W. Willinger, V. Paxson, R. Riedi, and M. Taqqu. Long-range dependence and data network traffic. In Long-range Dependence: Theory and Applications, P. Doukhan, G. Oppenheim and M. S. Taqqu, eds., Birkhauser, 2002.
|
| |
34
|
W. Willinger, M. Taqqu, and A. Erramilli. A bibliographical guide to self-similar traffic and performance modeling for modern high-speed networks. In F. P. Kelly, S. Zachary, and I. Ziedins, editors, Stochastic Networks: Theory and Applications, pages 339--366. Oxford University Press, 1996.
|
| |
35
|
|
CITED BY 39
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Arvind Arasu , Brian Babcock , Shivnath Babu , Mayur Datar , Keith Ito , Itaru Nishizawa , Justin Rosenstein , Jennifer Widom, STREAM: the stanford stream data manager (demonstration description), Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Deepak S. Turaga , Brian Foo , Olivier Verscheure , Rong Yan, Configuring topologies of distributed semantic concept classifiers for continuous multimedia stream processing, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
Qi (George) Zhao , Mitsunori Ogihara , Haixun Wang , Jun (Jim) Xu, Finding global icebergs over distributed data sets, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lukasz Golab , Theodore Johnson , Nick Koudas , Divesh Srivastava , David Toman, Optimizing away joins on data streams, Proceedings of the 2nd international workshop on Scalable stream processing system, March 29-29, 2008, Nantes, France
|
|
|
Don Carney , Uğur Çetintemel , Alex Rasin , Stan Zdonik , Mitch Cherniack , Mike Stonebraker, Operator scheduling in a data stream manager, Proceedings of the 29th international conference on Very large data bases, p.838-849, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
Elke A. Rundensteiner , Luping Ding , Timothy Sutherland , Yali Zhu , Brad Pielech , Nishant Mehta, CAPE: continuous query engine with heterogeneous-grained adaptivity, Proceedings of the Thirtieth international conference on Very large data bases, p.1353-1356, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
|
|
|
|
|
|
|
|
|
Jimeng Sun , Evan Hoke , John D. Strunk , Gregory R. Ganger , Christos Faloutsos, Intelligent system monitoring on large clusters, Proceedings of the 3rd workshop on Data management for sensor networks: in conjunction with VLDB 2006, September 11-11, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
Namit Jain , Shailendra Mishra , Anand Srinivasan , Johannes Gehrke , Jennifer Widom , Hari Balakrishnan , Uǧur Çetintemel , Mitch Cherniack , Richard Tibbetts , Stan Zdonik, Towards a streaming SQL standard, Proceedings of the VLDB Endowment, v.1 n.2, August 2008
|
|
|
|
|
|
|
|
|
|
|
|
Mohammad Hossein Bateni , Lukasz Golab , Mohammad Taghi Hajiaghayi , Howard Karloff, Scheduling to minimize staleness and stretch in real-time data warehouses, Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, August 11-13, 2009, Calgary, AB, Canada
|
|