|
ABSTRACT
Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for on-line analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
 |
2
|
Arvind Arasu , Brian Babcock , Shivnath Babu , Jon McAlister , Jennifer Widom, Characterizing memory requirements for queries over continuous data streams, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543642]
|
| |
3
|
A. Arasu, S. Babu, J. Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Technical Report, Nov. 2002. dbpubs.stanford.edu:8090/pub/2002-57.
|
 |
4
|
|
 |
5
|
|
 |
6
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
7
|
|
 |
8
|
Brain Babcock , Mayur Datar , Rajeev Motwani , Liadan O'Callaghan, Maintaining variance and k-medians over data stream windows, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.234-243, June 09-11, 2003, San Diego, California
[doi> 10.1145/773153.773176]
|
| |
9
|
S. Babu, J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. Technical Report, Nov. 2002. dbpubs.stanford.edu:8090/pub/2002-52.
|
| |
10
|
|
| |
11
|
D. Carney, U. Cetinternel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik. Monitoring streams---A New Class of Data Management Applications. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 215--226.
|
| |
12
|
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, M. Shah. TelegraphCQ: Continuous Data flow Processing for an Uncertain World. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 269--280.
|
| |
13
|
S. Chandrasekaran, M. J. Franklin. Streaming Queries over Streaming Data. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 203--214.
|
| |
14
|
S. Chandrasekaran, S. Krishnamurthy, S. Madden, A. Deshpande, M. J. Franklin, J. M. Hellerstein, M. Shah. Windows Explained, Windows Expressed. 2003. www.cs.berkeley.edu/~sirish/research/streaquel.pdf.
|
| |
15
|
|
 |
16
|
|
 |
17
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
| |
18
|
Y. Chen, G. Dong, J. Han, B. W. Wah, J. Wang. Multi-Dimensional Regression Analysis of Time-Series Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 323--334.
|
| |
19
|
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, S. Zdonik. Scalable Distributed Stream Processing. In Proc. Conf. on Innovative Data Syst. Res, 2003.
|
| |
20
|
G. Cormode, M. Datar, P. Indyk, S. Muthukrishnan. Comparing Data Streams Using Hamming Norms (How to Zero In). In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 335--345.
|
 |
21
|
Corinna Cortes , Kathleen Fisher , Daryl Pregibon , Anne Rogers, Hancock: a language for extracting signatures from data streams, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.9-17, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347094]
|
 |
22
|
Chuck Cranor , Yuan Gao , Theodore Johnson , Vlaidslav Shkapenyuk , Oliver Spatscheck, Gigascope: high performance network monitoring with an SQL interface, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564777]
|
| |
23
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
| |
24
|
D. DeHaan, E. D. Demaine, L. Golab, A. Lopez-Ortiz, J. I. Munro. Towards Identifying Frequent Items in Sliding Windows. Technical Report, March 2003. db.uwaterloo.ca/~lgolab/frequent.pdf.
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
C. Faloutsos. Sensor Data Mining: Similarity Search and Pattern Analysis. Tutorial in Proc. Int. Conf. on Very Large Data Bases, 2002.
|
| |
29
|
|
| |
30
|
P. Flajolet, G. N. Martin. Probabilistic Counting. In Proc. Symp. on Foundations of Computer Science, 1983, pp. 76--82, 1983.
|
 |
31
|
|
 |
32
|
|
 |
33
|
Johannes Gehrke , Flip Korn , Divesh Srivastava, On computing correlated aggregates over continual data streams, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.13-24, May 21-24, 2001, Santa Barbara, California, United States
|
 |
34
|
|
 |
35
|
|
| |
36
|
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. QuickSAND: Quick Summary and Analysis of Network Data. Technical Report, Dec. 2001. citeseer.nj.nec.com/gilbert01quicksand.html
|
| |
37
|
|
| |
38
|
L. Golab, M. T. Özsu. Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams. Technical Report, Feb. 2003. db.uwaterloo.ca/~ddbms/publications/stream/multijoins.pdf.
|
| |
39
|
L. Golab, M. T. Özsu. Data Stream Management Issues --- A Survey. Technical Report, Apr. 2003. db.uwaterloo.ca/~ddbms/publications/stream/streamsurvey.pdf.
|
 |
40
|
|
| |
41
|
|
| |
42
|
|
| |
43
|
M. A. Hammad, M. J. Franklin, W. G. Aref, A. K. Elmagarmid. Scheduling for shared window joins over data streams. Submitted for publication, Feb. 2003.
|
 |
44
|
|
| |
45
|
J. Kang, J. Naughton, S. Viglas. Evaluating Window Joins over Unbounded Streams. To appear in Proc. Int. Conf. on Data Engineering, 2003.
|
| |
46
|
F. Korn, S. Muthukrishnan, D. Srivastava. Reverse Nearest Neighbor Aggregates over Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 814--825.
|
| |
47
|
A. Lerner, D. Shasha. AQuery: Query Language for Ordered Data, Optimization Techniques, and Experiments. Technical Report, March 2003. csdocs.cs.nyu.edu/Dienst/Repository/2.0/Body/ncstrl.nyu_cs%2fTR2003-836/pdf.
|
| |
48
|
|
| |
49
|
|
 |
50
|
|
 |
51
|
|
 |
52
|
|
| |
53
|
|
| |
54
|
G. S. Manku, R. Motwani. Approximate Frequency Counts over Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 346--357.
|
 |
55
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.251-262, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
56
|
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosen-stein, R. Varma. Query Processing, Approximation, and Resource Management in a Data Stream Management System. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 245--256.
|
 |
57
|
|
| |
58
|
V. Raman, A. Deshpande, J. Hellerstein. Using State Modules for Adaptive Query Processing. To appear in Proc. Int. Conf. on Data Engineering, 2003.
|
| |
59
|
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. To appear in Proc. Int. Conf. on Data Engineering, 2003.
|
| |
60
|
Stream Query Repository, www-db.stanford.edu/stream/sqr.
|
| |
61
|
M. Sullivan, A. Heybey. Tribeca: A System for Managing Large Databases of Network Trafic. In Proc. USENIX Annual Technical Conf., 1998.
|
| |
62
|
N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, M. Stonebraker. Load Shedding in a Data Stream Manager. Technical Report, Feb. 2003. www.cs.brown.edu/~tatbul/papers tatbul_tr.pdf.
|
| |
63
|
Traderbot, www.traderbot.com.
|
| |
64
|
P. Tucker, D. Maier, T. Sheard, L. Fegaras. Enhancing relational operators for querying over punctuated data streams. 2002. www.cse.ogi.edu/dot/niagara/pstream/punctuating.pdf.
|
| |
65
|
P. Tucker, T. Tufte, V. Papadimos, D. Maier. NEXMark---a Benchmark for Querying Data Streams. 2002. www.cse.ogi.edu/dot/niagara/pstream/nexmark.pdf.
|
| |
66
|
T. Urhan, M. J. Franklin. XJoin: A Reactively-Scheduled Pipelined Join Operator. In IEEE Data Engineering Bulletin, 23(2):27--33, June 2000.
|
 |
67
|
|
| |
68
|
H. Wang, C. Zaniolo. ATLaS: A Native Extension of SQL for Data Mining and Stream Computations. citeseer.nj.nec.com/551711.html.
|
| |
69
|
|
| |
70
|
Y. Yao and J. Gehrke. Query Processing for Sensor Networks. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 233--244.
|
| |
71
|
|
| |
72
|
Y. Zhu, D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 358--369.
|
CITED BY 90
|
|
Mohamed Medhat Gaber , Shonali Krishnaswamy , Arkady Zaslavsky, Cost-efficient mining techniques for data streams, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, p.109-114, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
Lukasz Golab , David DeHaan , Erik D. Demaine , Alejandro Lopez-Ortiz , J. Ian Munro, Identifying frequent items in sliding windows over on-line packet streams, Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, October 27-29, 2003, Miami Beach, FL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Huanmei Wu , Betty Salzberg , Gregory C Sharp , Steve B Jiang , Hiroki Shirato , David Kaeli, Subsequence matching on structured time series data, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Irina Botan , Donald Kossmann , Peter M. Fischer , Tim Kraska , Dana Florescu , Rokas Tamosevicius, Extending XQuery with window functions, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yijian Bai , Hetal Thakkar , Haixun Wang , Chang Luo , Carlo Zaniolo, A data stream language and system designed for power and extensibility, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
Kristine Towne , Qiang Zhu , Calisto Zuzarte , Wen-Chi Hou, Window query processing for joining data streams with relations, Proceedings of the 2007 conference of the center for advanced studies on Collaborative research, October 22-25, 2007, Richmond Hill, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lisha Ma , Werner Nutt , Hamish Taylor, Condensative stream query language for data streams, Proceedings of the eighteenth conference on Australasian database, p.113-122, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hyo-Sang Lim , Jae-Gil Lee , Min-Jae Lee , Kyu-Young Whang , Il-Yeol Song, Continuous query processing in data streams using duality of data and queries, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
Angelo Brayner , Aretusa Lopes , Diorgens Meira , Ricardo Vasconcelos , Ronaldo Menezes, An adaptive in-network aggregation operator for query processing in wireless sensor networks, Journal of Systems and Software, v.81 n.3, p.328-342, March, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Angelo Brayner , Aretusa Lopes , Diorgens Meira , Ricardo Vasconcelos , Ronaldo Menezes, Toward adaptive query processing in wireless sensor networks, Signal Processing, v.87 n.12, p.2911-2933, December, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|