ACM Home Page
Please provide us with feedback. Feedback
Space efficient mining of multigraph streams
Full text PdfPdf (341 KB)
Source Symposium on Principles of Database Systems archive
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Baltimore, Maryland
SESSION: Research session 7: data stream management table of contents
Pages: 271 - 282  
Year of Publication: 2005
ISBN:1-59593-062-0
Authors
Graham Cormode  Bell Laboratories
S. Muthukrishnan  Rutgers University
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGMOD: ACM Special Interest Group on Management of Data
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 41,   Citation Count: 11
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1065167.1065201
What is a DOI?

ABSTRACT

The challenge of monitoring massive amounts of data generated by communication networks has led to the interest in data stream processing. We study streams of edges in massive communication multigraphs, defined by (source, destination) pairs. The goal is to compute properties of the underlying graph while using small space (much smaller than the number of communicants), and to avoid bias introduced because some edges may appear many times, while others are seen only once. We give results for three fundamental problems on multigraph degree sequences: estimating frequency moments of degrees, finding the heavy hitter degrees, and computing range sums of degree values. In all cases we are able to show space bounds for our summarizing algorithms that are significantly smaller than storing complete information. We use a variety of data stream methods: sketches, sampling, hashing and distinct counting, but a common feature is that we use cascaded summaries: nesting multiple estimation techniques within one another. In our experimental study, we see that such summaries are highly effective, enabling massive multigraph streams to be effectively summarized to answer queries of interest with high accuracy using only a small amount of space.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
 
6
 
7
A. Blum, P. Gibbons, D. Song, and S. Venkataraman. New streaming algorithms for fast detection of superspreaders. Technical Report IRP-TR-04-23, Intel Research, 2004.
 
8
 
9
 
10
G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. In Latin American Informatics, pages 29--38, 2004.
11
 
12
13
 
14
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. In Proceedings of the International Colloquium on Automata, Languages, and Programming, 2004.
 
15
 
16
17
18
 
19
20
 
21
M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report SRC 1998-011, DEC Systems Research Centre, 1998.
 
22
23
 
24
N. Koudas and D. Srivastava. Data stream query processing: A tutorial. In Proceedings of the International Conference on Very Large Data Bases, page 1149, 2003.
 
25
Internet traffic archive. http://ita.ee.lbl.gov/.
 
26
G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceedings of the International Conference on Very Large Data Bases, pages 346--357, 2002.
 
27
 
28
29
 
30

CITED BY  11
Collaborative Colleagues:
Graham Cormode: colleagues
S. Muthukrishnan: colleagues