|
ABSTRACT
Several new methods are presented for selecting n records at random without replacement from a file containing N records. Each algorithm selects the records for the sample in a sequential manner—in the same order the records appear in the file. The algorithms are online in that the records for the sample are selected iteratively with no preprocessing. The algorithms require a constant amount of space and are short and easy to implement. The main result of this paper is the design and analysis of Algorithm D, which does the sampling in O(n) time, on the average; roughly n uniform random variates are generated, and approximately n exponentiation operations (of the form ab, for real numbers a and b) are performed during the sampling. This solves an open problem in the literature. CPU timings on a large mainframe computer indicate that Algorithm D is significantly faster than the sampling algorithms in use today.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Ernvall, J. and Nevalainen, O. An algorithm for unbiased random sampling. Comput. J. 25, 1 (January 1982), 45-47.
|
| |
3
|
Fan, C.T., Muller, M.E., and Rezucha, I. Development of sampling plans by using sequential (item-by-item) selection techniques and digital computers. Am. Stat. Assn. J. 57 (June 1962), 387-402.
|
 |
4
|
|
| |
5
|
Kawarasaki, J. and Sibuya, M. Random numbers for simple random sampling without replacement. Keio Math. Sem. Rep No. 7 (1982), 1- 9.
|
| |
6
|
|
| |
7
|
Lindstrom, E.E. and Vitter, J.S. The design and analysis of BucketSort for bubble memory secondary storage. Tech. Rep. CS-83- 23, Brown University, Providence, RI, (September 1983). See also U.S. Patent Application Provisional Serial No. 500741 (filed June 3, 1983).
|
| |
8
|
|
| |
9
|
Vitter, J.S. Random sampling with a reservoir. Tech. Rep. CS-83-17, Brown University, Providence, RI, (July 1983).
|
| |
10
|
Vitter, J.S. Optimum algorithms for two random sampling problems. In Proceedings of the 24th IEEE Symposium on Foundations of Computer Science, Tucson, AZ (November 1983), 65-75.
|
CITED BY 21
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lipyeow Lim , Min Wang , Sriram Padmanabhan , Jeffrey Scott Vitter , Ramesh Agarwal, Dynamic maintenance of web indexes using landmarks, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
REVIEW
"Robert M. Lynch : Reviewer"
The Vitter paper describes several new sequential algorithms for randomly
sampling :In> records sequentially from a file containing :IN> records. It
is presented in nine sections, including the Appendix. The main text of the
paper focuses
more...
|