|
ABSTRACT
One-pass algorithms for sampling n records without replacement from a population of unknown size n are known as reservoir-sampling algorithms. In this article, Vitter's reservoir-sampling algorithm, algorithm Z, is modified to give a more efficient algorithm, algorithm K. Additionally, two new algorithms, algorithm L and algorithm M, are proposed. If the time for scanning the population is ignored, all the four algorithms have expected CPU time O(n(1 + log(N/n))), which is optimum up to a constant factor. Expressions of the expected CPU time for the algorithms are presented. Among the four, algorithm L is the simplest, and algorithm M is the most efficient when n and N/n are large and N is O(n2).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
DEVROYE, L. 1986. Non-uniform Random Variate Generation. Springer-Verlag, New York.
|
| |
4
|
FAN, C. T., MULLER, M. E., ANn REZUCHA, I. 1962. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. Am. Stat. Assoc. J. 57, 298 (June), 387-402.
|
| |
5
|
IMSL. 1987. IMSL Stat/Library, User's Manual. IMSL, Inc., Houston, Tex.
|
| |
6
|
|
| |
7
|
|
| |
8
|
LI, K.-H. 1992a.. Reservoir sampling algorithms. I. Analysis and modification of Vitter's algorithm Z. Tech. Rep. No. 92.2, Dept. of Statistics, The Chinese Univ. of Hong Kong, Shatin, N.T., Hong Kong.
|
| |
9
|
LL K.-H. 1992b. Reservoir sampling algorithms. II. New algorithms. Tech. Rep. No. 92.3, Dept. of Statistics, The Chinese Univ. of Hong Kong, Shatin, N.T., Hong Kong.
|
| |
10
|
McLEOD, A. I. AND BELLHOUSE, D. R. 1983. A convenient algorithm for drawing a simple random sample. Appl. Stat. 32, 2, 182-184.
|
| |
11
|
PIN~C4AM, R.S. 1987. An efficient algorithm for drawing a simple random sample. Appl. Stat. 36, 3, 370-372.
|
 |
12
|
|
REVIEW
"Andrew Donald Booth : Reviewer"
Three new algorithms to generate a simple random sample from a
finite population in a single pass through the data are described.
Samples of appropriate parameters are provided, and the timings for
sample runs are compared with tho
more...
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Inferring constraints from multiple snapshots
ACM Transactions on Graphics (TOG)
12, 4
David Kurlander
, Steven Feiner
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
|