ACM Home Page
Please provide us with feedback. Feedback
Approximate computation of multidimensional aggregates of sparse data using wavelets
Full text PdfPdf (1.67 MB)
Source International Conference on Management of Data archive
Proceedings of the 1999 ACM SIGMOD international conference on Management of data table of contents
Philadelphia, Pennsylvania, United States
Pages: 193 - 204  
Year of Publication: 1999
ISBN:1-58113-084-8
Also published in ...
Authors
Jeffrey Scott Vitter  Center for Geometric Computing and Department of Computer Science, Duke University, Durham, NC
Min Wang  Center for Geometric Computing and Department of Computer Science, Duke University, Durham, NC
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 103,   Citation Count: 112
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/304182.304199
What is a DOI?

ABSTRACT

Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries. In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner. We construct a compact data cube, which is an approximate and space-efficient representation of the underlying multidimensional array, based upon a multiresolution wavelet decomposition. In the on-line phase, each aggregation query can generally be answered using the compact data cube in one I/O or a smalll number of I/Os, depending upon the desired accuracy. We present two I/O-efficient algorithms to construct the compact data cube for the important case of sparse high-dimensional arrays, which often arise in practice. The traditional histogram methods are infeasible for the massive high-dimensional data sets in OLAP applications. Previously developed wavelet techniques are efficient only for dense data. Our on-line query processing algorithm is very fast and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
AAD+96
AV88
 
BDF+97
D . Barbara, W. DuMouchel, C. Faloutsos, P. J. H aas, J. M. Hellerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng,, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey data reduction report. Bulletin o.f the Technical Committee on Data Engineering, 20(4), 1997.
 
Bur
U.S. Census Bureau. Census bureau databases. The online data are available on the web at http ://www. census, gov/.
 
FJS97
 
GBLP96
GM98
HAMS97
HHW97
HRU96
 
JS94
MVW98
 
PC98
N. Pendse and R. Creeth. The OLAP report, 1998. The online report is available on the web at http ://www. olapreport, com/Analyses, htm/.
 
PI96
 
PI97
PIHS96
SAC+79
 
SDS96
 
Ven94
D.E. Vengroff. A transparent parallel I/O environment. In Proceedings of the 199,{ DAGS Symposium on Parallel Computa tion, July 1994.
 
Ven97
D.E. Vengroff. TPIE User Manual and Reference. Duke University, 1997. The manual and software distribution are available on the web at http ://www. cs. duke. edu/TPlE/.
 
Vit99
 
VS94
J.S. Vitter and E. A. M, Shriver. Algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2- 3):110-147, 1994. Special double issue on Large-Scale Memories.
 
VV96
VWI98
ZDN97

CITED BY  112

Collaborative Colleagues:
Jeffrey Scott Vitter: colleagues
Min Wang: colleagues