ACM Home Page
Please provide us with feedback. Feedback
Sample synopses for approximate answering of group-by queries
Full text PdfPdf (501 KB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
SESSION: Research sessions: Query processing table of contents
Pages 403-414  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
Philipp Rösch  Technische Universität Dresden, Dresden, Germany
Wolfgang Lehner  Technische Universität Dresden, Dresden, Germany
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 49,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516408
What is a DOI?

ABSTRACT

With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. Typically, those analytical queries partition the data into groups and aggregate the values within the groups. Further, with the commonly used roll-up and drill-down operations a broad range of group-by queries is posed to the system, which makes the construction of highly-specialized synopses difficult.

In this paper, we propose a general-purpose sampling scheme that is biased in order to answer group-by queries with high accuracy. While existing techniques focus on the size of the group when computing its sample size, our technique is based on its standard deviation. The basic idea is that the more homogeneous a group is, the less representatives are required in order to give a good estimate. With an extensive set of experiments, we show that our approach reduces both the estimation error and the construction cost compared to existing techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
 
6
J. Brutlag and T. Richardson. A block sampling approach to distinct value estimation. Technical report, University of Washington, Department of Statistics, 2000.
 
7
 
8
9
10
 
11
W. Cochran. Sampling Techniques. Wiley Series in Probability & Mathematical Statistics. John Wiley & Sons, 3rd edition, 1977.
 
12
 
13
 
14
15
 
16
 
17
18
19
 
20
21
22
 
23
P. Rösch, R. Gemulla, and W. Lehner. Designing Random Sample Synopses with Outliers. In ICDE, pages 1400--1402, 2008.
 
24
25
Collaborative Colleagues:
Philipp Rösch: colleagues
Wolfgang Lehner: colleagues