ACM Home Page
Please provide us with feedback. Feedback
Improving estimation accuracy of aggregate queries on data cubes
Full text PdfPdf (312 KB)
Source
Data Warehousing and OLAP archive
Proceeding of the ACM 11th international workshop on Data warehousing and OLAP table of contents
Napa Valley, California, USA
SESSION: Multidimensional modeling and queries: languages, optimization, processing table of contents
Pages 33-40  
Year of Publication: 2008
ISBN:978-1-60558-250-4
Authors
Elaheh Pourabbas  National Research Council, Rome, Italy
Arie Shoshani  Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 72,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458432.1458439
What is a DOI?

ABSTRACT

In this paper, we investigate the problem of estimation of a target database from summary databases derived from a base data cube. We show that such estimates can be derived by choosing a primary database which uses a proxy database to estimate the results. This technique is common in statistics, but an important issue we are addressing is the accuracy of these estimates. Specifically, given multiple primary and multiple proxy databases, that share the same summary measure, the problem is how to select the primary and proxy databases that will generate the most accurate target database estimation possible. We propose an algorithmic approach for determining the steps to select or compute the source databases from multiple summary databases, which makes use of the principles of information entropy. We show that the source databases with the largest number of cells in common provide the more accurate estimates. We prove that this is consistent with maximizing the entropy. We provide some experimental results on the accuracy of the target database estimation in order to verify our results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Deming, W. E. & Stephan, F. F. (1940). On a least square adjustment of a sampled frequency table when the expected marginal totals are known, Annals of Mathematical Statistics 11: 427--444.
 
2
Ghosh, M. & Rao, J. N. K. (1994). Small area estimation: An appraisal, Statistical Science 9: 55--93.
3
4
 
5
Jaynes, E. (1979). Where do we stand on maximum entropy?, The Maximum Entropy Formalism, R. Levine and M. Tribes Eds., MIT Press, Cambridge, MA, pp. 15--118.
 
6
Kullback, S. (1959). Information Theory and Statistics, J. Wiley & Sons, Inc., London.
 
7
 
8
9
 
10
11

Collaborative Colleagues:
Elaheh Pourabbas: colleagues
Arie Shoshani: colleagues