ACM Home Page
Please provide us with feedback. Feedback
Sampling algorithms in a stream operator
Full text PdfPdf (498 KB)
Source International Conference on Management of Data archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data table of contents
Baltimore, Maryland
SESSION: Research papers: streams table of contents
Pages: 1 - 12  
Year of Publication: 2005
ISBN:1-59593-060-4
Authors
Theodore Johnson  AT&T Labs Research
S. Muthukrishnan  Rutgers University
Irina Rozenbaum  Rutgers University
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 77,   Citation Count: 15
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066157.1066159
What is a DOI?

ABSTRACT

Complex queries over high speed data streams often need to rely on approximations to keep up with their input. The research community has developed a rich literature on approximate streaming algorithms for this application. Many of these algorithms produce samples of the input stream, providing better properties than conventional random sampling. In this paper, we abstract the stream sampling process and design a new stream sample operator. We show how it can be used to implement a wide variety of algorithms that perform sampling and sampling-based aggregations. Also, we show how to implement the operator in Gigascope - a high speed stream database specialized for IP network monitoring applications. As an example study, we apply the operator within such an enhanced Gigascope to perform subset-sum sampling which is of great interest for IP network management. We evaluate this implemention on a live, high speed internet traffic data stream and find that (a) the operator is a flexible, versatile addition to Gigascope suitable for tuning and algorithm engineering, and (b) the operator imposes only a small evaluation overhead. This is the first operational implementation we know of, for a wide variety of stream sampling algorithms at line speed within a data stream management system.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
N. Duffield, C. Lund, M. Thorup. Learn more, sample less: control of volume and variance in network measurement. SIGCOMM 2001 Measurement workshop.
 
3
G. Manku and R. Motwani. Approximate frequency counts over data streams. Proc. VLDB, 2002, 346--357.
4
 
5
S. Muthukrishnan. Data stream algorithms and applications. http://www.cs,rutgers.edu/~stream-1-1.ps.
 
6
A. Singh. http://www.cs.ucsb.edu/~ambuj/Courses/multimediaDB/sampling.pdf
7
8
 
9
 
10
 
11
SQL Server 2005. <u>http://www.microsoft.com/technet/prodtechnol/sql/2005/evaluate/dwsqlsy.mspx</u>
 
12
P. Gulutzan and T. Pelzer, SQL-99 Complete, Really, CMP Books, 1999.
 
13
D. Carney, U. Cetinternel, M. Cherniack, C. Coney, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul and S. Zdonik. Monitoring Streams - A New Class of Data Management Applications. VLDB 2002.
14
15
16
 
17
S. Chandrasekaran et al. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. Proc. CIDR 2003.
18
 
19
 
20
R. Motwani, J. Widom, A. Arasu. B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, R. Varma. Query Processing, Resource Management, and Approximation in a Data Stream management System. In CIDR, pages 245--256, Jan 2003.
21
 
22
H. Wang, C. Zaniolo and C. Luo, ATLAS: A Small but Complete SQL Extension for Data Mining and Data Streams, Proc. VLDB 2003 pg 5--20.
 
23
Y.-N. Law, H. Wang and C. Zaniolo, Query Languages and Data Models for Database Sequences and Data Streams, Proc. VLDB 2004 pg 492--503.
 
24

CITED BY  15
Collaborative Colleagues:
Theodore Johnson: colleagues
S. Muthukrishnan: colleagues
Irina Rozenbaum: colleagues