|
ABSTRACT
Recent work has demonstrated the effectiveness of the wavelet decomposition in reducing large amounts of data to compact sets of wavelet coefficients (termed "wavelet synopses") that can be used to provide fast and reasonably accurate approximate answers to queries. A major criticism of such techniques is that unlike, for example, random sampling, conventional wavelet synopses do not provide informative error guarantees on the accuracy of individual approximate answers. In fact, as this paper demonstrates, errors can vary widely (without bound) and unpredictably, even for identical queries on nearly-identical values in distinct parts of the data. This lack of error guarantees severely limits the practicality of traditional wavelets as an approximate query-processing tool, because users have no idea of the quality of any particular approximate answer. In this paper, we introduce Probabilistic Wavelet Synopses, the first wavelet-based data reduction technique with guarantees on the accuracy of individual approximate answers. Whereas earlier approaches rely on deterministic thresholding for selecting a set of "good" wavelet coefficients, our technique is based on a novel, probabilistic thresholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values, and then flips coins to select the synopsis. We show how our scheme avoids the above pitfalls of deterministic thresholding, providing highly-accurate answers for individual data values in a data vector. We propose several novel optimization algorithms for tuning our probabilistic thresholding scheme to minimize desired error metrics. Experimental results on real-world and synthetic data sets evaluate these algorithms, and demonstrate the effectiveness of our probabilistic wavelet synopses in providing fast, highly-accurate answers with error guarantees.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
2
|
G. A. Anastassiou and X. M. Yu. "Monotone and probabilistic wavelet approximation". Stoch. Analysis and Applications, 10(3), 1992.
|
| |
3
|
G. A. Anastassiou and X. M. Yu. "Probabilistic discrete wavelet approximation". Num. Func. Anal. & Opt., 1992.
|
| |
4
|
|
| |
5
|
R. A. DeVore. "Nonlinear Approximation". Acta Numerica, 7, 1998.
|
| |
6
|
David Donoho. Personal communication, March 2001.
|
| |
7
|
M. Garofalakis and P. B. Gibbons. "Wavelet Synopses with Error Guarantees". Bell Labs Tech. Memorandum, December 2001.
|
| |
8
|
David M. Gay. Personal Communication, February 2001.
|
| |
9
|
|
 |
10
|
|
| |
11
|
Information and Computer Science, University of California at Irvine. ftp://ftp.ics.uci.edu/pub/machine-learning-databases, 2000.
|
| |
12
|
|
| |
13
|
Y. Matias. "Highly Parallel Randomized Algorithmics". PhD thesis, Tel Aviv University, 1992.
|
 |
14
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
CITED BY 39
|
|
|
|
|
|
|
|
Hu Cao , Ouri Wolfson , Goce Trajcevski, Spatio-temporal data reduction with deterministic error bounds, Proceedings of the 2003 joint workshop on Foundations of mobile computing, p.33-42, September 19, 2003, San Diego, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Goce Trajcevski , Hu Cao , Peter Scheuermanny , Ouri Wolfsonz , Dennis Vaccaro, On-line data reduction and the quality of history in moving objects databases, Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access, June 25-25, 2006, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
Spiros Papadimitriou , Anthony Brockwell , Christos Faloutsos, Adaptive, hands-off stream mining, Proceedings of the 29th international conference on Very large data bases, p.560-571, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|