|
ABSTRACT
We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding load in the form of dropping tuples from the data streams. We first discuss alternate architectural models for data stream join processing, and we survey suitable measures for the quality of an approximation of a set-valued query result. We then consider the number of generated result tuples as the quality measure, and we give optimal offline and fast online algorithms for it. In a thorough experimental study with synthetic and real data we show the efficacy of our solutions. For applications with demand for exact results we introduce a new Archive-metric which captures the amount of work needed to complete the join in case the streams are archived for later processing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Arvind Arasu , Brian Babcock , Shivnath Babu , Jon McAlister , Jennifer Widom, Characterizing memory requirements for queries over continuous data streams, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543642]
|
 |
2
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
3
|
|
| |
4
|
D. Barbará, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. E. Ioannidis, H. V. Jagadish, T. Johnson, R. T. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey data reduction report. IEEE Data Engineering Bulletin, 20(4):3--45, 1997.
|
| |
5
|
|
| |
6
|
D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams---a new class of data management applications. In Proc. Int. Conf. on Very Large Databases (VLDB), 2002.
|
| |
7
|
S. Chandrasekaran and M. J. Franklin. Streaming queries over streaming data. In Proc. Int. Conf. on Very Large Databases (VLDB), 2002.
|
 |
8
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, On random sampling over joins, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.263-274, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
9
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
| |
10
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
| |
11
|
M. R. Garey and D. S. Johnson. Computers and Intractability. W. H. Freeman and Company, 1979.
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
C. J. Hahn, S. G. Warren, and J. London. Edited synoptic cloud reports from ships and land stations over the globe, 1982--1991. http://cdiac.esd.ornl.gov/ftp/ndp026b, 1996.
|
| |
17
|
|
| |
18
|
J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. A. Shah. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7--18, 2000.
|
| |
19
|
|
 |
20
|
Zachary G. Ives , Daniela Florescu , Marc Friedman , Alon Levy , Daniel S. Weld, An adaptive query execution system for data integration, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.299-310, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
21
|
J. Kang, J. F. Naughton, and S. D. Viglas. Evaluating window joins over unbounded streams. In Proc. Int. Conf. on Data Engineering (ICDE), 2003.
|
| |
22
|
F. Korn, S. Muthukrishnan, and D. Srivastava. Reverse nearest neighbor aggregates over data streams. In Proc. Int. Conf. on Very Large Databases (VLDB), 2002.
|
| |
23
|
|
 |
24
|
|
| |
25
|
R. T. Rockafellar. Network flows and monotropic optimization. John Wiley & Sons, 1984.
|
| |
26
|
|
 |
27
|
|
| |
28
|
|
CITED BY 42
|
|
|
|
|
Yabo Xu , Ke Wang , Ada Wai-Chee Fu , Rong She , Jian Pei, Classification spanning correlated data streams, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
Yufei Tao , Man Lung Yiu , Dimitris Papadias , Marios Hadjieleftheriou , Nikos Mamoulis, RPJ: producing fast join results on streams through rate-based optimization, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vagelis Hristidis , Oscar Valdivia , Michail Vlachos , Philip S. Yu, Continuous keyword search on multiple text streams, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nesime Tatbul , Uğur Çetintemel , Stan Zdonik , Mitch Cherniack , Michael Stonebraker, Load shedding in a data stream manager, Proceedings of the 29th international conference on Very large data bases, p.309-320, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jimeng Sun , Evan Hoke , John D. Strunk , Gregory R. Ganger , Christos Faloutsos, Intelligent system monitoring on large clusters, Proceedings of the 3rd workshop on Data management for sensor networks: in conjunction with VLDB 2006, September 11-11, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bugra Gedik , Kun-Lung Wu , Philip S. Yu , Ling Liu, GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding, IEEE Transactions on Knowledge and Data Engineering, v.19 n.10, p.1363-1380, October 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|