| Improving estimation accuracy of aggregate queries on data cubes |
| Full text |
Pdf
(312 KB)
|
Source
|
Data Warehousing and OLAP
archive
Proceeding of the ACM 11th international workshop on Data warehousing and OLAP
table of contents
Napa Valley, California, USA
SESSION: Multidimensional modeling and queries: languages, optimization, processing
table of contents
Pages 33-40
Year of Publication: 2008
ISBN:978-1-60558-250-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 72, Citation Count: 0
|
|
|
ABSTRACT
In this paper, we investigate the problem of estimation of a target database from summary databases derived from a base data cube. We show that such estimates can be derived by choosing a primary database which uses a proxy database to estimate the results. This technique is common in statistics, but an important issue we are addressing is the accuracy of these estimates. Specifically, given multiple primary and multiple proxy databases, that share the same summary measure, the problem is how to select the primary and proxy databases that will generate the most accurate target database estimation possible. We propose an algorithmic approach for determining the steps to select or compute the source databases from multiple summary databases, which makes use of the principles of information entropy. We show that the source databases with the largest number of cells in common provide the more accurate estimates. We prove that this is consistent with maximizing the entropy. We provide some experimental results on the accuracy of the target database estimation in order to verify our results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Deming, W. E. & Stephan, F. F. (1940). On a least square adjustment of a sampled frequency table when the expected marginal totals are known, Annals of Mathematical Statistics 11: 427--444.
|
| |
2
|
Ghosh, M. & Rao, J. N. K. (1994). Small area estimation: An appraisal, Statistical Science 9: 55--93.
|
 |
3
|
|
 |
4
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
| |
5
|
Jaynes, E. (1979). Where do we stand on maximum entropy?, The Maximum Entropy Formalism, R. Levine and M. Tribes Eds., MIT Press, Cambridge, MA, pp. 15--118.
|
| |
6
|
Kullback, S. (1959). Information Theory and Statistics, J. Wiley & Sons, Inc., London.
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
|