ACM Home Page
Please provide us with feedback. Feedback
New sampling-based summary statistics for improving approximate query answers
Full text PdfPdf (1.69 MB)
Source International Conference on Management of Data archive
Proceedings of the 1998 ACM SIGMOD international conference on Management of data table of contents
Seattle, Washington, United States
Pages: 331 - 342  
Year of Publication: 1998
ISBN:0-89791-995-5
Also published in ...
Authors
Phillip B. Gibbons  Information Sciences Research Center, Bell Laboratories
Yossi Matias  Department of Computer Science, Tel-Aviv University
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 78,   Citation Count: 112
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/276304.276334
What is a DOI?

ABSTRACT

In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. Before DBMSs providing highly-accurate approximate answers can become a reality, many new techniques for summarizing data and for estimating answers from summarized data must be developed. This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution. We quantify their advantages over standard sample views in terms of the number of additional sample points for the same view size, and hence in providing more accurate query answers. Finally, we consider their application to providing fast approximate answers to hot list queries. Our algorithms maintain their accuracy in the presence of ongoing insertions to the data warehouse.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

AMS96
 
Ant92
 
AS94
 
AZ96
 
BDF+97
D. Barbar~i, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New jersey data reduction report. Bulletin of the Technical Committee on Data Engineering, 20(4):3- 45, 1997.
BM96
BMUT97
 
FJS97
 
Fla85
 
FM83
P. Flajolet and G. N. Martin. Probabilistic counting. In Proc. 24th IEEE Syrup. on Foundations of Computer Science, pages 76-82, October 1983.
 
FM85
 
GM95
P.B. Gibbons and Y. Matias, August 1995. Presentation and feedback during a Bell Labs-Teradata presentation to Walmart scientists and executives on proposed improvements to the Teradata DBS.
 
GM97
E B. Gibbons and Y. Matias. Synopsis data structures, concise samples, and mode statistics. Manuscript, July 1997.
 
GMP97a
P.B. Gibbons, Y. Matias, and V. Poosala. Aqua project white paper. Technical report, Bell Laboratories, Murray Hill, New Jersey, December 1997.
 
GMP97b
 
GPA+98
E B. Gibbons, V. Poosala, S. Acharya, Y. Bartal, Y. Matias, S. Muthukrishnan, S. Ramaswamy, and T. Suel. AQUA: System and techniques for approximate query answering. Technical report, Bell Laboratories, Murray Hill, New Jersey, February 1998.
HHW97
 
HK95
M. Hofri and N. Kechris. Probabilistic counting of a large number of events. Manuscript, 1995.
 
HNSS95
IC93
 
Ioa93
IP95
 
Mat92
Y. Matias. Highly Parallel Randomized Algorithmics. PhD thesis, Tel Aviv University, Israel, 1992.
Mor78
 
MSY96
Y. Matias, S. C. Sahinalp, and N. E. Young. Performance evaluation of approximate priority queues. Presented at DIMACS Fifth Implementation Challenge: Priority Queues, Dictionaries, and Point Sets, organized by D. S. Johnson and C. McGeoch, October 1996.
 
MVN93
 
MVY94
 
OR89
 
OR92
PIHS96
 
Pre97
D. Pregibon. Mega-monitoring: Developing and using telecommunications signatures, October 1997. Invited talk at the DIMACS Workshop on Massive Data Sets in Telecommunications.
Vit85
 
VL93
WVZT90

CITED BY  112

Collaborative Colleagues:
Phillip B. Gibbons: colleagues
Yossi Matias: colleagues