ACM Home Page
Please provide us with feedback. Feedback
Approximate medians and other quantiles in one pass and with limited memory
Full text PdfPdf (1.25 MB)
Source International Conference on Management of Data archive
Proceedings of the 1998 ACM SIGMOD international conference on Management of data table of contents
Seattle, Washington, United States
Pages: 426 - 435  
Year of Publication: 1998
ISBN:0-89791-995-5
Also published in ...
Authors
Gurmeet Singh Manku  IBM Almaden Research Center
Sridhar Rajagopalan  IBM Almaden Research Center
Bruce G. Lindsay  IBM Almaden Research Center
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 53,   Citation Count: 66
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/276304.276342
What is a DOI?

ABSTRACT

We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude. We also discuss methods that couple the approximation algorithms with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter. We present the algorithms, their theoretical analysis and simulation results on different datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
"DB2 MVS:", To be completed.
 
5
"Informix", To be completed.
 
6
 
7
M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E. Tarjan, "Time Bounds for Selection", in J. Cornput. Syst. Sci., vol. 7, pp. 448-461, 1973.
 
8
M. R. Paterson, "Progress in Selection", Deptt. of Computer Science, University of Warwick, Coventry, UK, 1997.
 
9
D. Dor, Selection Algorithms, PhD thesis, Tel-Aviv University, 1995.
 
10
 
11
D. Dot and U. Zwick, "Finding the anth Largest Element", Combinatorica, vol. 16, pp. 41-58, 1996.
 
12
D. Dor and U. Zwick, "Median Selection Requires (2 q-e)n Comparisons", Technical Report 312/96, Department of Computer Science, Tel-Aviv University, Apr. 1996.
 
13
 
14
I. Pohl, "A Minimum Storage Algorithm for Computing the Median", Technical Report IBM Research Report RC 2701 (# 12713), IBM T J Watson Center, Nov. 1969.
 
15
J. I. Munro and M. S. Paterson, "Selection and Sorting with Limited Storage", Theoretical Computer Science, vol. 12, pp. 315-323, 1980.
16
 
17
R. Agrawal and A. Swami, "A One-Pass Space-Efficient Algorithm for Finding Quantiles", in Proc. 7th Intl. Conf. Management of Data (COMAD-95), Pune, India, 1995.
 
18
 
19
W. Hoeffding, "Probability Inequalities for Sum8 of Bounded Random Variables", American Statistical Association Jornal, pp. 13-30, Mar. 1963.

CITED BY  66

Collaborative Colleagues:
Gurmeet Singh Manku: colleagues
Sridhar Rajagopalan: colleagues
Bruce G. Lindsay: colleagues