|
ABSTRACT
Several studies have demonstrated the effectiveness of the wavelet, decomposition as a tool for reducing large amounts of data down to compact, wavelet synopses that can be used to obtain fast, accurate approximate answers to user queries. While conventional wavelet synopses are based on greedily minimizing the overall root-mean-squared (i.e., L2-norm) error in the data approximation, recent work has demonstrated that such synopses can suffer from important problems, including severe bias and wide variance in the quality of the data reconstruction, and lack of non-trivial guarantees for individual approximate answers. As a result, probabilistic thresholding schemes have been recently proposed as a means of building wavelet synopses that try to probabilistically control other approximation-error metrics, such as the maximum relative error in data-value reconstruction, which is arguably the most important for approximate query answers and meaningful error guarantees.One of the main open problems posed by this earlier work is whether it is possible to design efficient deterministic wavelet-thresholding algorithms for minimizing non-L2 error metrics that are relevant to approximate query processing systems, such as maximum relative or maximum absolute error. Obviously, such algorithms can guarantee better wavelet synopses and avoid the pitfalls of probabilistic techniques (e.g., "bad" coin-flip sequences) leading to poor solutions. In this paper, we address this problem and propose novel, computationally efficient schemes for deterministic wavelet thresholding with the objective of optimizing maximum-error metrics. We introduce an optimal low polynomial-time algorithm for one-dimensional wavelet thresholding--our algorithm is based on a new Dynamic-Programming (DP) formulation, and can be employed to minimize the maximum relative or absolute error in the data reconstruction. Unfortunately, directly extending our one-dimensional DP algorithm to multi-dimensional wavelets results in a super-exponential increase in time complexity with the data dimensionality. Thus, we also introduce novel, polynomial-time approximation schemes (with tunable approximation guarantees for the target maximum-error metric) for deterministic wavelet thresholding in multiple dimensions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
2
|
Laurent Amsaleg, Philippe Bonnet, Michael J. Franklin, Anthony Tomasic, and Tolga Urhan. "Improving Responsiveness for Wide-Area Data Access". IEEE Data Engineering Bulletin, 20(3):3--11, September 1997. (Special Issue on Improving Query Responsiveness).
|
| |
3
|
|
 |
4
|
|
 |
5
|
Amol Deshpande , Minos Garofalakis , Rajeev Rastogi, Independence is good: dependency-based histogram synopses for high-dimensional data, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.199-210, May 21-24, 2001, Santa Barbara, California, United States
|
| |
6
|
R. A. DeVore. "Nonlinear Approximation". Acta Numerica, 7:51--150, 1998.
|
 |
7
|
|
 |
8
|
|
| |
9
|
Minos Garofalakis and Amit Kumar. "Deterministic Wavelet Thresholding for Maximum-Error Metrics". Bell Labs Technical Memorandum, December 2003.
|
| |
10
|
|
 |
11
|
Dimitrios Gunopulos , George Kollios , Vassilis J. Tsotras , Carlotta Domeniconi, Approximating multi-dimensional aggregate range queries over real attributes, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.463-474, May 15-18, 2000, Dallas, Texas, United States
|
 |
12
|
|
 |
13
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
| |
14
|
|
 |
15
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
Apostol Natsev , Rajeev Rastogi , Kyuseok Shim, WALRUS: a similarity retrieval algorithm for image databases, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.395-406, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
20
|
|
 |
21
|
|
CITED BY 21
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sorabh Gandhi , Suman Nath , Subhash Suri , Jie Liu, GAMPS: compressing multi sensor data by grouping and amplitude scaling, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|