|
ABSTRACT
Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach which reduces monitoring the value of a function (vis-à-vis a threshold) to a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
 |
2
|
|
 |
3
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
4
|
|
 |
5
|
|
| |
6
|
Berkovitz, L. 2002. Convexity and Optimization in Rn. Wiley, New York, NY.
|
| |
7
|
|
| |
8
|
Don Carney , Uǧur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Greg Seidman , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Monitoring streams: a new class of data management applications, Proceedings of the 28th international conference on Very Large Data Bases, p.215-226, August 20-23, 2002, Hong Kong, China
|
| |
9
|
|
| |
10
|
Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., and Zdonik, S. 2003. Scalable distributed stream processing. In CIDR 2003: Proceedings of the First Biennial Conference on Innovative Data Systems Research (Asilomar, CA).
|
 |
11
|
|
| |
12
|
Dilman, M. and Raz, D. 2001. Efficient reactive monitoring. In INFOCOM '01: Proceedings of the Twentieth Annual Joint Conference of the IEEE Computer and Communications Societies. 1012--1019.
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
Ling Huang , Minos Garofalakis , Joseph Hellerstein , Anthony Joseph , Nina Taft, Toward sophisticated detection with distributed triggers, Proceedings of the 2006 SIGCOMM workshop on Mining network data, p.311-316, September 11-15, 2006, Pisa, Italy
[doi> 10.1145/1162678.1162684]
|
| |
17
|
Jain, A., Hellerstein, J. M., Ratnasamy, S., and Wetherall, D. 2004. A wakeup call for Internet monitoring systems: The case for distributed triggers. In Proceedings of the 3rd ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets, San Diego, CA).
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
D. Abadi , D. Carney , U. Çetintemel , M. Cherniack , C. Convey , C. Erwin , E. Galvez , M. Hatoun , A. Maskey , A. Rasin , A. Singer , M. Stonebraker , N. Tatbul , Y. Xing , R. Yan , S. Zdonik, Aurora: a data stream management system, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
[doi> 10.1145/872757.872855]
|
 |
27
|
|
| |
28
|
Parrilo, P. 2003. Semidefinite programming relaxations for semialgebraic problems. Math. Programm. 96, 2, 293--320.
|
| |
29
|
Rose, T., Stevenson, M., and Whitehead, M. 2002. The Reuters Corpus Volume 1---from yesterday's news to tomorrow's language resources. In Proceedings of the Third International Conference on Language Resources and Evaluation (Las Palmas de Gran Canaria).
|
 |
30
|
|
| |
31
|
|
| |
32
|
|
 |
33
|
Douglas Terry , David Goldberg , David Nichols , Brian Oki, Continuous queries over append-only databases, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.321-330, June 02-05, 1992, San Diego, California, United States
|
 |
34
|
|
| |
35
|
|
| |
36
|
|
|