|
ABSTRACT
The querying and analysis of data streams has been a topic of much recent interest, motivated by applications from the fields of networking, web usage analysis, sensor instrumentation, telecommunications, and others. Many of these applications involve monitoring answers to continuous queries over data streams produced at physically distributed locations, and most previous approaches require streams to be transmitted to a single location for centralized processing. Unfortunately, the continual transmission of a large number of rapid data streams to a central location can be impractical or expensive. We study a useful class of queries that continuously report the k largest values obtained from distributed data streams ("top-k monitoring queries"), which are of particular interest because they can be used to reduce the overhead incurred while running other types of monitoring queries. We show that transmitting entire data streams is unnecessary to support these queries and present an alternative approach that reduces communication significantly. In our approach, arithmetic constraints are maintained at remote stream sources to ensure that the most recently provided top-k answer remains valid to within a user-specified error tolerance. Distributed communication is only necessary on occasion, when constraints are violated, and we show empirically through extensive simulation on real-world data that our approach reduces overall communication cost by an order of magnitude compared with alternatives that o er the same error guarantees.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Arlitt and T. Jin. 1998 world cup web site access logs, Aug. 1988. Available at http://www.acm.org/sigcomm/ITA/.
|
| |
2
|
M. Arlitt and T. Jin. Workload characterization of the 1998 world cup web site. Technical Report HPL-1999-35R1, Hewlett Packard, Sept. 1999.
|
 |
3
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
4
|
B. Babcock and C. Olston. Distributed top-k monitoring. Technical report, Stanford University Computer Science Department, 2002. http://dbpubs.stanford.edu/pub/2002-61.
|
| |
5
|
|
| |
6
|
P. A. Bernstein, B. T. Blaustein, and E. M. Clarke. Fast maintenance of semantic integrity assertions using redundant aggregate data. In Proc. VLDB, 1980.
|
| |
7
|
N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In Proc. ICDE, 2002.
|
| |
8
|
D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams - a new class of data management applications. In Proc. VLDB, 2002.
|
| |
9
|
|
 |
10
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
| |
11
|
Denial of service attacks using nameservers. Incident Note IN-2000-04, CMU Software Engineering Institute CERT Coordination Center, Apr. 2000. http://www.cert.org/incident_notes/IN-2000-04.html.
|
| |
12
|
M. Dilman and D. Raz. Efficient reactive monitoring. In Proc. INFOCOM, 2001.
|
 |
13
|
Cynthia Dwork , Ravi Kumar , Moni Naor , D. Sivakumar, Rank aggregation methods for the Web, Proceedings of the 10th international conference on World Wide Web, p.613-622, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372165]
|
| |
14
|
D. Estrin, L. Girod, G. Pottie, and M. Srivastava. Instrumenting the world with wireless sensor networks. In Proc. International Conference on Acoustics, Speech, and Signal Processing, 2001.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
A. Householder, A. Manion, L. Pesante, and G. Weaver. Managing the threat of denial-of-service attacks. Technical report, CMU Software Engineering Institute CERT Coordination Center, Oct. 2001. http://www.cert.org/archive/pdf/Managing_DoS.pdf.
|
| |
24
|
|
| |
25
|
D. Li, K. Wong, Y. H. Hu, and A. Sayeed. Detection, classification and tracking of targets. IEEE Signal Processing Magazine, 2002.
|
 |
26
|
|
| |
27
|
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. VLDB, 2002.
|
| |
28
|
Rex Min , Manish Bhardwaj , Seong-Hwan Cho , Eugene Shih , Amit Sinha , Alice Wang , Anantha Chandrakasan, Low-Power Wireless Sensor Networks, Proceedings of the The 14th International Conference on VLSI Design (VLSID '01), p.205, January 03-07, 2001
|
| |
29
|
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, resource management, and approximation in a data stream management system. In Proc. CIDR, 2003.
|
 |
30
|
|
| |
31
|
T. Palpanas, R. Sidle, R. Cochrane, and H. Pirahesh. Incremental maintenance for non-distributive aggregate functions. In Proc. VLDB, 2002.
|
 |
32
|
|
 |
33
|
|
| |
34
|
|
| |
35
|
K. Yi, H. Yu, J. Yang, G. Xia, and Y. Chen. Efficient maintenance of materialized top-k views. In Proc. ICDE, 2003.
|
CITED BY 58
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. Zeinalipour-Yazti , Z. Vagena , D. Gunopulos , V. Kalogeraki , V. Tsotras , M. Vlachos , N. Koudas , D. Srivastava, The threshold join algorithm for top-k queries in distributed sensor networks, Proceedings of the 2nd international workshop on Data management for sensor networks, August 30-30, 2005, Trondheim, Norway
|
|
|
Reynold Cheng , Ben Kao , Sunil Prabhakar , Alan Kwan , Yicheng Tu, Adaptive stream filters for entity-based queries with non-value tolerance, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qi (George) Zhao , Mitsunori Ogihara , Haixun Wang , Jun (Jim) Xu, Finding global icebergs over distributed data sets, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
|
|
|
|
|
|
Srinivas Kashyap , Supratim Deb , K. V. M. Naidu , Rajeev Rastogi , Anand Srinivasan, Efficient gossip-based aggregate computation, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ling Huang , Minos Garofalakis , Joseph Hellerstein , Anthony Joseph , Nina Taft, Toward sophisticated detection with distributed triggers, Proceedings of the 2006 SIGCOMM workshop on Mining network data, p.311-316, September 11-15, 2006, Pisa, Italy
|
|
|
Abhinandan Das , Sumit Ganguly , Minos Garofalakis , Rajeev Rastogi, Distributed set-expression cardinality estimation, Proceedings of the Thirtieth international conference on Very large data bases, p.312-323, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dimitris Sacharidis , Kostas Patroumpas , Manolis Terrovitis , Verena Kantere , Michalis Potamias , Kyriakos Mouratidis , Timos Sellis, On-line discovery of hot motion paths, Proceedings of the 11th international conference on Extending database technology: Advances in database technology, March 25-29, 2008, Nantes, France
|
|
|
|
|
|
Navendu Jain , Dmitry Kit , Prince Mahajan , Praveen Yalagandula , Mike Dahlin , Yin Zhang, STAR: self-tuning aggregation for scalable monitoring, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Demetrios Zeinalipour-Yazti , Zografoula Vagena , Vana Kalogeraki , Dimitrios Gunopulos , Vassilis J. Tsotras , Michail Vlachos , Nick Koudas , Divesh Srivastava, Finding the K highest-ranked answers in a distributed network, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.53 n.9, p.1431-1449, June, 2009
|
|
|
Zhenjie Zhang , Reynold Cheng , Dimitris Papadias , Anthony K.H. Tung, Minimizing the communication cost for continuous skyline maintenance, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
Thomas Neumann , Matthias Bender , Sebastian Michel , Ralf Schenkel , Peter Triantafillou , Gerhard Weikum, Distributed top-k aggregation queries at large, Distributed and Parallel Databases, v.26 n.1, p.3-27, August 2009
|
|
|
|
|
|
|
|