ACM Home Page
Please provide us with feedback. Feedback
Random sampling with a reservoir
Full text PdfPdf (1.51 MB)
Source ACM Transactions on Mathematical Software (TOMS) archive
Volume 11 ,  Issue 1  (March 1985) table of contents
Pages: 37 - 57  
Year of Publication: 1985
ISSN:0098-3500
Author
Jeffrey S. Vitter  Brown Univ., Providence, RI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 162,   Citation Count: 99
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/3147.3165
What is a DOI?

ABSTRACT

We introduce fast algorithms for selecting a random sample of n records without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in O(n(1 + log(N/n))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
BENTLEY, J.L. Personal communication, Apr. 1983; see {11}.
 
2
ERNVALL, J. AND NEVALAINEN, O. An algorithm for unbiased random sampling. Comput. J. 25, 1 (Jan. 1982), 45-47.
 
3
FAN, C. T., MULLER, M. E., AND REZUCHA, I. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. Am. Stat. Assoc. J. 57 (June 1962), 387-402.
 
4
FELLER, W. An Introduction to Probability Theory and Its Applications, vol. I, 3rd ed. Wiley, 1968.
 
5
FELLER, W. An Introduction to Probability Theory and Its Applications, vol. II, 2nd ed. Wiley, 1971.
6
 
7
KAWARASAKI, J., AND SIBUYA, M. Random numbers for simple random sampling without replacement. Keio Math. Sere. Rep. No. 7 (1982), 1-9.
 
8
 
9
10
 
11
VITTER, J.S. Optimum algorithms for two random sampling problems. In Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science (Tucson, Az., Nov. 7-9), IEEE, New York, 1983 pp. 65-75.

CITED BY  99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


REVIEW

"Prokop Vondracek : Reviewer"

There is presented a fast algorithm for selecting a random sample of n records without replacement for a pool of N records, where the value of N is unknown beforehand. Algorithm Z, which does the sampling, is des  more...


Peer to Peer - Readers of this Article have also read: