ACM Home Page
Please provide us with feedback. Feedback
Approximation and streaming algorithms for histogram construction problems
Full text PdfPdf (1.38 MB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 31 ,  Issue 1  (March 2006) table of contents
Pages: 396 - 438  
Year of Publication: 2006
ISSN:0362-5915
Authors
Sudipto Guha  University of Pennsylvania, Philadelphia, PA
Nick Koudas  University of Toronto, Ont, Canada
Kyuseok Shim  Seoul National University, Seoul, Korea
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 152,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1132863.1132873
What is a DOI?

ABSTRACT

Histograms and related synopsis structures are popular techniques for approximating data distributions. These have been successful in query optimization and a variety of applications, including approximate querying, similarity searching, and data mining, to name a few. Histograms were a few of the earliest synopsis structures proposed and continue to be used widely. The histogram construction problem is to construct the best histogram restricted to a space bound that reflects the data distribution most accurately under a given error measure.The histograms are used as quick and easy estimates. Thus, a slight loss of accuracy, compared to the optimal histogram under the given error measure, can be offset by fast histogram construction algorithms. A natural question arises in this context: Can we find a fast near optimal approximation algorithm for the histogram construction problem? In this article, we give the first linear time (1+ε)-factor approximation algorithms (for any ε > 0) for a large number of histogram construction problems including the use of piecewise small degree polynomials to approximate data, workloads, etc. Several of our algorithms extend to data streams.Using synthetic and real-life data sets, we demonstrate that in many scenarios the approximate histograms are almost identical to optimal histograms in quality and are significantly faster to construct.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
 
6
Donjerkovic, D., Ioannidis, Y. E., and Ramakrishnan, R. 1999. Dynamic histograms: Capturing evolving data sets. CS-TR 99-1396, University of Wisconsin, Madison, WI. Mar.
 
7
8
9
 
10
11
12
13
 
14
15
 
16
17
 
18
19
 
20
Guha, S., Shim, K., and Woo, J. 2004. REHIST: Relative error histogram construction algorithms. In Proceedings of the VLDB Conference, 300--311.
 
21
 
22
23
 
24
 
25
Ioannidis, Y. E. 2003. The history of histograms (abridged). In Proceedings of the VLDB Conference, 19--30.
 
26
27
 
28
Kooi, R. 1980. The optimization of queries in relational databases. Case Western Reserve University.
29
30
31
32
 
33
Muthukrishnan, S. and Strauss, M. 2004. Approximate histogram and wavelet summaries of streaming data. DIMACS TR 52.
34
35
36
 
37

CITED BY  11

Collaborative Colleagues:
Sudipto Guha: colleagues
Nick Koudas: colleagues
Kyuseok Shim: colleagues