ACM Home Page
Please provide us with feedback. Feedback
Rangesum histograms
Full text PdfPdf (1.10 MB)
Source Symposium on Discrete Algorithms archive
Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
Baltimore, Maryland
SESSION: Session 4B table of contents
Pages: 233 - 242  
Year of Publication: 2003
ISBN:0-89871-538-5
Authors
S. Muthukrishnan  AT&T Labs---Research, Florham Park, NJ
Martin Strauss  AT&T Labs---Research, Florham Park, NJ
Sponsors
: SIAM Activity Group on Discrete Mathematics
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 24,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

A rangesum query to an array A is a pair (l, r) of range endpoints, which should be answered by ΣlirA[i]. To compress A, we consider representing an array A lossily by a histogram, a function that is constant on each of a small number of buckets. We then answer range queries from H instead of from A, i.e., as ΣlirH[i]. An optimal rangesum histogram H for this purpose is one whose bucket boundaries and constant heights within buckets are chosen to minimize the expected square error, El, r[(ΣlirA[i]--ΣlirH[i].)2], assuming each rangesum query is equally likely. Rangesum histograms find many applications in database systems.In a degenerate variation, all rangesum queries are over ranges of size one, namely, individual points; histograms optimal for this special case are called pointwise optimal histograms. Pointwise optimal histogram is a classical notion in statistics and approximation theory, but rangesum optimal histogram appears to be novel in these areas. While optimal pointwise histograms can be constructed efficiently by simple dynamic progrmming, no efficient (even approximate) general rangesmn histogram construction algorithms were previously known. In practice, all commercial database systems use heuristically built histograms for pointwise and rangesum queries.We present the first general algorithms for approximate rangesum histograms. Given parameter B, we denote by (α, β)-approximation an algorithm to produce a (αB)-bucket histogram with error at most β times the error of the optimal B-bucket histogram. We give a (2, 1)-approximation with runtime O(N2B), a (2, 1+)-approximation with runtime N + (B log(N)/∊)O(1) (1), and a (1, 1 + ∊)-approximation with runtime O(B3N4/∊2). We also consider the problem of dynamic maintenance of rangesum histograms for data updated by additive changes, and we give a (2, 1 + ∊)-approximation that uses space (Blog(N)/∊)O(1) and time (Blog(N)/∊)O(1) for update and query operations. The bounds are nearly competitive with some of the best known bounds for constructing pointwise optimal histograms modulo small additional number of buckets used; however, rangesum histograms are substantially harder to construct because of the long range dependence between subproblems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
5
6
 
7
8
9
 
10
 
11
12
 
13
M. Naor and O. Reingold. Private commuication, March, 1999.
14
 
15
V. Poosala. Histograms for selecitivty estimation. PhD Thesis, U. Wisconsin, Madison. 1997.

CITED BY  9

Collaborative Colleagues:
S. Muthukrishnan: colleagues
Martin Strauss: colleagues