ACM Home Page
Please provide us with feedback. Feedback
Sketching unaggregated data streams for subpopulation-size queries
Full text PdfPdf (312 KB)
Source
Symposium on Principles of Database Systems archive
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Beijing, China
SESSION: Sequences, streams, events table of contents
Pages: 253 - 262  
Year of Publication: 2007
ISBN:978-1-59593-685-1
Authors
Edith Cohen  AT&T Labs-Research
Nick Duffield  AT&T Labs-Research
Haim Kaplan  Tel Aviv University
Carsten Lund  AT&T Labs-Research
Mikkel Thorup  AT&T Labs-Research
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 39,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1265530.1265566
What is a DOI?

ABSTRACT

IP packet streams consist of multiple interleaving IP flows. Statistical summaries of these streams, collected for different measurement periods, are used for characterization of traffic, billing, anomaly detection, inferring traffic demands, configuring packet filters and routing protocols, and more. While queries are posed over the set of flows, the summarization algorithmis applied to the stream of packets. Aggregation of traffic into flows before summarization requires storage of per-flow counters, which is often infeasible. Therefore, the summary has to be produced over the unaggregated stream.

An important aggregate performed over a summary is to approximate the size of a subpopulation of flows that is specified a posteriori. For example, flows belonging to an application such as Web or DNS or flows that originate from a certain Autonomous System. We design efficient streaming algorithms that summarize unaggregated streams and provide corresponding unbiased estimators for subpopulation sizes. Our summaries outperform, in terms of estimates accuracy, those produced by packet sampling deployed by Cisco's sampled NetFlow, the most widely deployed such system. Performance of our best method, step sample-and-hold is close to that of summaries that can be obtainedfrom pre-aggregated traffic.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Cohen, N. Duffield, H. Kaplan, C. Lund, and M. Thorup. Algorithms and estimators for accurate summarization of Internet traffic. Manuscript, 2007.
2
 
3
E. Cohen and H. Kaplan. Sketches and estimators for subpopulation weight queries. Manuscript, 2007.
 
4
 
5
E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. Manuscript, 2007.
6
7
8
9
10
11
12
13
14
15
16
 
17


Collaborative Colleagues:
Edith Cohen: colleagues
Nick Duffield: colleagues
Haim Kaplan: colleagues
Carsten Lund: colleagues
Mikkel Thorup: colleagues