ACM Home Page
Please provide us with feedback. Feedback
Histograms revisited: when are histograms the best approximation method for aggregates over joins?
Full text PdfPdf (199 KB)
Source Symposium on Principles of Database Systems archive
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Baltimore, Maryland
SESSION: Research session 6: complexity & performance evaluation / data stream management table of contents
Pages: 228 - 237  
Year of Publication: 2005
ISBN:1-59593-062-0
Author
Alin Dobra  University of Florida, Gainesville, FL
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGMOD: ACM Special Interest Group on Management of Data
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 28,   Citation Count: 2
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1065167.1065196
What is a DOI?

ABSTRACT

The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency -- the so called uniform distribution assumption. In this paper we show that a significantly less restrictive statistical assumption - the elements within a bucket are randomly arranged even though they might have different frequencies -- leads to identical formulae for approximating aggregate queries using histograms. This observation allows us to identify scenarios in which histograms are well suited as approximation methods -- in fact we show that in these situations sampling and sketching are significantly worse -- and provide tight error guarantees for the quality of approximations. At the same time we show that, on average, histograms are rather poor approximators outside these scenarios.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
6
7
8
 
9
F. Olken and D. Rotem. Random sampling from databases - a survey, 1995.
10
 
11
J. Shao. Mathematical Statistics. Springer-Verlag, 1999.