|
ABSTRACT
The challenge of monitoring massive amounts of data generated by communication networks has led to the interest in data stream processing. We study streams of edges in massive communication multigraphs, defined by (source, destination) pairs. The goal is to compute properties of the underlying graph while using small space (much smaller than the number of communicants), and to avoid bias introduced because some edges may appear many times, while others are seen only once. We give results for three fundamental problems on multigraph degree sequences: estimating frequency moments of degrees, finding the heavy hitter degrees, and computing range sums of degree values. In all cases we are able to show space bounds for our summarizing algorithms that are significantly smaller than storing complete information. We use a variety of data stream methods: sketches, sampling, hashing and distinct counting, but a common feature is that we use cascaded summaries: nesting multiple estimation techniques within one another. In our experimental study, we see that such summaries are highly effective, enabling massive multigraph streams to be effectively summarized to answer queries of interest with high accuracy using only a small amount of space.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303978]
|
 |
2
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
 |
3
|
Arvind Arasu , Brian Babcock , Shivnath Babu , Mayur Datar , Keith Ito , Itaru Nishizawa , Justin Rosenstein , Jennifer Widom, STREAM: the stanford stream data manager (demonstration description), Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
[doi> 10.1145/872757.872854]
|
 |
4
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
5
|
|
| |
6
|
Ziv Bar-Yosseff , Ravi Kumar , D. Sivakumar, Reductions in streaming algorithms, with an application to counting triangles in graphs, Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.623-632, January 06-08, 2002, San Francisco, California
|
| |
7
|
A. Blum, P. Gibbons, D. Song, and S. Venkataraman. New streaming algorithms for fast detection of superspreaders. Technical Report IRP-TR-04-23, Intel Research, 2004.
|
| |
8
|
|
| |
9
|
|
| |
10
|
G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. In Latin American Informatics, pages 29--38, 2004.
|
 |
11
|
|
| |
12
|
|
 |
13
|
Cristian Estan , George Varghese, New directions in traffic measurement and accounting, Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, August 19-23, 2002, Pittsburgh, Pennsylvania, USA
|
| |
14
|
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. In Proceedings of the International Colloquium on Automata, Languages, and Programming, 2004.
|
| |
15
|
Joan Feigenbaum , Sampath Kannan , Andrew McGregor , Siddharth Suri , Jian Zhang, Graph distances in the streaming model: the value of space, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, January 23-25, 2005, Vancouver, British Columbia
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report SRC 1998-011, DEC Systems Research Centre, 1998.
|
| |
22
|
|
 |
23
|
|
| |
24
|
N. Koudas and D. Srivastava. Data stream query processing: A tutorial. In Proceedings of the International Conference on Very Large Data Bases, page 1149, 2003.
|
| |
25
|
Internet traffic archive. http://ita.ee.lbl.gov/.
|
| |
26
|
G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceedings of the International Conference on Very Large Data Bases, pages 346--357, 2002.
|
| |
27
|
|
| |
28
|
|
 |
29
|
Suman Nath , Phillip B. Gibbons , Srinivasan Seshan , Zachary R. Anderson, Synopsis diffusion for robust aggregation in sensor networks, Proceedings of the 2nd international conference on Embedded networked sensor systems, November 03-05, 2004, Baltimore, MD, USA
[doi> 10.1145/1031495.1031525]
|
| |
30
|
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Graham Cormode , Lukasz Golab , Korn Flip , Andrew McGregor , Divesh Srivastava , Xi Zhang, Estimating the confidence of conditional functional dependencies, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|