|
ABSTRACT
While work in recent years has demonstrated that wavelets can be efficiently used to compress large quantities of data and provide fast and fairly accurate answers to queries, little emphasis has been placed on using wavelets in approximating datasets containing multiple measures. Existing decomposition approaches will either operate on each measure individually, or treat all measures as a vector of values and process them simultaneously. We show in this paper that the resulting individual or combined storage approaches for the wavelet coefficients of different measures that stem from these existing algorithms may lead to suboptimal storage utilization, which results to reduced accuracy to queries. To alleviate this problem, we introduce in this work the notion of an extended wavelet coefficient as a flexible storage method for the wavelet coefficients, and propose novel algorithms for selecting which extended wavelet coefficients to retain under a given storage constraint. Experimental results with both real and synthetic datasets demonstrate that our approach achieves improved accuracy to queries when compared to existing techniques.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Pacific Northwest weather data. http://www-k12.-atmos.washington.edu/k12/grayskies/nw_weather.html.
|
 |
2
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
| |
3
|
T. Barclay, D. Slutz, and J. Gray. Terraserver: A spatial data warehouse, 2000.
|
| |
4
|
|
| |
5
|
A. Deligiannakis and N. Roussopoulos. Extended Wavelets for Multiple Measures. Technical Report CS-TR-4462, University of Maryland, March 2003.
|
 |
6
|
Amol Deshpande , Minos Garofalakis , Rajeev Rastogi, Independence is good: dependency-based histogram synopses for high-dimensional data, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.199-210, May 21-24, 2001, Santa Barbara, California, United States
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
B. Jawerth and W. Sweldens. Distinct Sampling for Highly-Accurate answers to Distinct Values Queries and Event Reports. In VLDB 2001.
|
 |
11
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
| |
12
|
|
 |
13
|
Apostol Natsev , Rajeev Rastogi , Kyuseok Shim, WALRUS: a similarity retrieval algorithm for image databases, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.395-406, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
Jeffrey Scott Vitter , Min Wang , Bala Iyer, Data cube approximation and histograms via wavelets, Proceedings of the seventh international conference on Information and knowledge management, p.96-104, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288645]
|
|