|
ABSTRACT
Histograms and related synopsis structures are popular techniques for approximating data distributions. These have been successful in query optimization and a variety of applications, including approximate querying, similarity searching, and data mining, to name a few. Histograms were a few of the earliest synopsis structures proposed and continue to be used widely. The histogram construction problem is to construct the best histogram restricted to a space bound that reflects the data distribution most accurately under a given error measure.The histograms are used as quick and easy estimates. Thus, a slight loss of accuracy, compared to the optimal histogram under the given error measure, can be offset by fast histogram construction algorithms. A natural question arises in this context: Can we find a fast near optimal approximation algorithm for the histogram construction problem? In this article, we give the first linear time (1+ε)-factor approximation algorithms (for any ε > 0) for a large number of histogram construction problems including the use of piecewise small degree polynomials to approximate data, workloads, etc. Several of our algorithms extend to data streams.Using synthetic and real-life data sets, we demonstrate that in many scenarios the approximate histograms are almost identical to optimal histograms in quality and are significantly faster to construct.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, The Aqua approximate query answering system, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.574-576, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
| |
6
|
Donjerkovic, D., Ioannidis, Y. E., and Ramakrishnan, R. 1999. Dynamic histograms: Capturing evolving data sets. CS-TR 99-1396, University of Wisconsin, Madison, WI. Mar.
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
 |
11
|
Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Yannis Kotidis , S. Muthukrishnan , Martin J. Strauss, Fast, small-space algorithms for approximate histogram maintenance, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
[doi> 10.1145/509907.509966]
|
 |
12
|
Anna C. Gilbert , Yannis Kotidis , S. Muthukrishnan , Marin J. Strauss, Optimal and approximate computation of summary statistics for range aggregates, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.227-236, May 2001, Santa Barbara, California, United States
[doi> 10.1145/375551.375598]
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
Guha, S., Shim, K., and Woo, J. 2004. REHIST: Relative error histogram construction algorithms. In Proceedings of the VLDB Conference, 300--311.
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
Ioannidis, Y. E. 2003. The history of histograms (abridged). In Proceedings of the VLDB Conference, 19--30.
|
| |
26
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
 |
27
|
|
| |
28
|
Kooi, R. 1980. The optimization of queries in relational databases. Case Western Reserve University.
|
 |
29
|
Nick Koudas , S. Muthukrishnan , Divesh Srivastava, Optimal histograms for hierarchical range queries (extended abstract), Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.196-204, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335223]
|
 |
30
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.426-435, June 01-04, 1998, Seattle, Washington, United States
|
 |
31
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
 |
32
|
|
| |
33
|
Muthukrishnan, S. and Strauss, M. 2004. Approximate histogram and wavelet summaries of streaming data. DIMACS TR 52.
|
 |
34
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
35
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
 |
36
|
|
| |
37
|
|
|