ACM Home Page
Please provide us with feedback. Feedback
Processing complex aggregate queries over data streams
Full text PdfPdf (1.50 MB)
Source International Conference on Management of Data archive
Proceedings of the 2002 ACM SIGMOD international conference on Management of data table of contents
Madison, Wisconsin
SESSION: Research sessions: continuous queries and streams table of contents
Pages: 61 - 72  
Year of Publication: 2002
ISBN:1-58113-497-5
Authors
Alin Dobra  Cornell University
Minos Garofalakis  Bell Labs, Lucent
Johannes Gehrke  Cornell University
Rajeev Rastogi  Bell Labs, Lucent
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 110,   Citation Count: 73
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564691.564699
What is a DOI?

ABSTRACT

Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.In this paper, we consider the problem of approximately answering general aggregate SQL queries over continuous data streams with limited memory. Our method relies on randomizing techniques that compute small "sketch" summaries of the streams that can then be used to provide approximate answers to aggregate queries with provable guarantees on the approximation error. We also demonstrate how existing statistical information on the base data (e.g., histograms) can be used in the proposed framework to improve the quality of the approximation provided by our algorithms. The key idea is to intelligently partition the domain of the underlying attribute(s) and, thus, decompose the sketching problem in a way that provably tightens our guarantees. Results of our experimental study with real-life as well as synthetic data streams indicate that sketches provide significantly more accurate answers compared to histograms for aggregate queries. This is especially true when our domain partitioning methods are employed to further boast the accuracy of the final estimates.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. "Classification and Regression Trees". Chapman & Hall, 1984.
 
6
7
 
8
 
9
A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. "Processing Complex Aggregate Queries over Data Streams". Bell Labs Tech. Memorandum, March 2002.
10
 
11
 
12
13
 
14
 
15
16
17
 
18
19
 
20
21
 
22
23
24

CITED BY  73

Collaborative Colleagues:
Alin Dobra: colleagues
Minos Garofalakis: colleagues
Johannes Gehrke: colleagues
Rajeev Rastogi: colleagues