|
ABSTRACT
Internet traffic patterns are believed to obey the power law, implying that most of the bandwidth is consumed by a small set of heavy users. Hence, queries that return a list of frequently occurring items are important in the analysis of real-time Internet packet streams. While several results exist for computing frequent item queries using limited memory in the infinite stream model, in this paper we consider the limited-memory sliding window model. This model maintains the last $N$ items that have arrived at any given time and forbids the storage of the entire window in memory. We present a deterministic algorithm for identifying frequent items in sliding windows defined over real-time packet streams. The algorithm uses limited memory, requires constant processing time per packet (amortized), makes only one pass over the data, and is shown to work well when tested on TCP traffic logs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
2
|
|
 |
3
|
|
 |
4
|
Chuck Cranor , Yuan Gao , Theodore Johnson , Vlaidslav Shkapenyuk , Oliver Spatscheck, Gigascope: high performance network monitoring with an SQL interface, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564777]
|
| |
5
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
| |
12
|
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. Proc. 28th Int. Conf. on Very Large Data Bases, pages 346--357, 2002.
|
| |
13
|
|
| |
14
|
|
| |
15
|
M. Sullivan and A. Heybey. Tribeca: A system for managing large databases of network traffic. Proc. USENIX Annual Technical Conf., 1998.
|
| |
16
|
Y. Zhu and D. Shasha. StatStream: Statistical monitoring of thousands of data streams in real time. Proc. 28th Int. Conf. on Very Large Data Bases, pages 358--369, 2002.
|
CITED BY 13
|
|
|
|
|
Tatsuya Mori , Masato Uchida , Ryoichi Kawahara , Jianping Pan , Shigeki Goto, Identifying elephant flows through periodically sampled packets, Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, October 25-27, 2004, Taormina, Sicily, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|