ACM Home Page
Please provide us with feedback. Feedback
Wavelet synopsis for data streams: minimizing non-euclidean error
Full text PdfPdf (304 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining table of contents
Chicago, Illinois, USA
SESSION: Research track paper table of contents
Pages: 88 - 97  
Year of Publication: 2005
ISBN:1-59593-135-X
Authors
Sudipto Guha  University of Pennsylvania, Philadelphia, PA
Boulos Harb  University of Pennsylvania, Philadelphia, PA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 80,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081870.1081884
What is a DOI?

ABSTRACT

We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much smaller than n. The B numbers are chosen to minimize a suitable error between the original data and the estimate derived from the synopsis.Several good one-pass wavelet construction streaming algorithms minimizing the l2 error exist. For other error measures, the problem is less understood. We provide the first one-pass small space streaming algorithms with provable error guarantees (additive approximation) for minimizing a variety of non-Euclidean error measures including all weighted lp (including l) and relative error lp metrics.In several previous works solutions (for weighted l2, l and maximum relative error) where the B synopsis coefficients are restricted to be wavelet coefficients of the data were proposed. This restriction yields suboptimal solutions on even fairly simple examples. Other lines of research, such as probabilistic synopsis, imposed restrictions on how the synopsis was arrived at. To the best of our knowledge this paper is the first paper to address the general problem, without any restriction on how the synopsis is arrived at, as well as provide the first streaming algorithms with guaranteed performance for these classes of error measures.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
5
6
 
7
8
 
9
 
10
 
11
S. Guha, C. Kim, and K. Shim. XWAVE: Optimal and approximate extended wavelets for streaming data. VLDB Conference, 2004.
 
12
13
 
14
15
 
16
Y. Matias and D. Urieli. Manuscript, 2004.
 
17
Y. Matias and D. Urieli. Optimal workload-based wavelet synopses. Proc. of ICDT, 2005.
 
18
19
 
20
S. Muthukrishnan. Workload optimal wavelet synopsis. DIMACS TR, 2004.
 
21
G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, 1996.

CITED BY  14

Collaborative Colleagues:
Sudipto Guha: colleagues
Boulos Harb: colleagues