|
ABSTRACT
A rangesum query to an array A is a pair (l, r) of range endpoints, which should be answered by Σl≤irA[i]. To compress A, we consider representing an array A lossily by a histogram, a function that is constant on each of a small number of buckets. We then answer range queries from H instead of from A, i.e., as Σl≤irH[i]. An optimal rangesum histogram H for this purpose is one whose bucket boundaries and constant heights within buckets are chosen to minimize the expected square error, El, r[(Σl≤irA[i]--Σl≤irH[i].)2], assuming each rangesum query is equally likely. Rangesum histograms find many applications in database systems.In a degenerate variation, all rangesum queries are over ranges of size one, namely, individual points; histograms optimal for this special case are called pointwise optimal histograms. Pointwise optimal histogram is a classical notion in statistics and approximation theory, but rangesum optimal histogram appears to be novel in these areas. While optimal pointwise histograms can be constructed efficiently by simple dynamic progrmming, no efficient (even approximate) general rangesmn histogram construction algorithms were previously known. In practice, all commercial database systems use heuristically built histograms for pointwise and rangesum queries.We present the first general algorithms for approximate rangesum histograms. Given parameter B, we denote by (α, β)-approximation an algorithm to produce a (αB)-bucket histogram with error at most β times the error of the optimal B-bucket histogram. We give a (2, 1)-approximation with runtime O(N2B), a (2, 1+∊)-approximation with runtime N + (B log(N)/∊)O(1) (1), and a (1, 1 + ∊)-approximation with runtime O(B3N4/∊2). We also consider the problem of dynamic maintenance of rangesum histograms for data updated by additive changes, and we give a (2, 1 + ∊)-approximation that uses space (Blog(N)/∊)O(1) and time (Blog(N)/∊)O(1) for update and query operations. The bounds are nearly competitive with some of the best known bounds for constructing pointwise optimal histograms modulo small additional number of buckets used; however, rangesum histograms are substantially harder to construct because of the long range dependence between subproblems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Yannis Kotidis , S. Muthukrishnan , Martin J. Strauss, Fast, small-space algorithms for approximate histogram maintenance, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
[doi> 10.1145/509907.509966]
|
 |
6
|
Anna C. Gilbert , Yannis Kotidis , S. Muthukrishnan , Marin J. Strauss, Optimal and approximate computation of summary statistics for range aggregates, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.227-236, May 2001, Santa Barbara, California, United States
[doi> 10.1145/375551.375598]
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
 |
12
|
Nick Koudas , S. Muthukrishnan , Divesh Srivastava, Optimal histograms for hierarchical range queries (extended abstract), Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.196-204, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335223]
|
| |
13
|
M. Naor and O. Reingold. Private commuication, March, 1999.
|
 |
14
|
|
| |
15
|
V. Poosala. Histograms for selecitivty estimation. PhD Thesis, U. Wisconsin, Madison. 1997.
|
CITED BY 9
|
|
|
|
|
A. R. Calderbank , A. Gilbert , K. Levchenko , S. Muthukrishnan , M. Strauss, Improved range-summable random variable construction algorithms, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, January 23-25, 2005, Vancouver, British Columbia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|