|
ABSTRACT
Parallel independent disks can enhance the performance of external memory (EM) algorithms, but the programming task is often difficult. Each disk can service only one read or write request at a time; the challenge is to keep the disks as busy as possible. In this article, we develop a randomized allocation discipline for parallel independent disks, called randomized cycling. We show how it can be used as the basis for an efficient distribution sort algorithm, which we call randomized cycling distribution sort (RCD). We prove that the expected I/O complexity of RCD is optimal. The analysis uses a novel reduction to a scenario with significantly fewer probabilistic interdependencies. We demonstrate RCD's practicality by experimental simulations. Using the randomized cycling discipline, algorithms developed for the unrealistic multihead disk model can be simulated on the realistic parallel disk model for the class of multipass algorithms, which make a complete pass through their data before accessing any element a second time. In particular, algorithms based upon the well-known distribution and merge paradigms of EM computation can be optimally extended from a single disk to parallel disks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Barve, R. D., and Vitter, J. S. 2002. A simple and efficient parallel disk mergesort. ACM Trans. Comput. Syst. 35, 2 (Mar./Apr.), 189--215.
|
| |
7
|
Dehne, F., Dittrich, W., and Hutchinson, D. 2003. Efficient external memory algorithms by simulating coarse-grained parallel algorithms. Algorithmica 36, 87--122.
|
| |
8
|
Dehne, F., Hutchinson, D., and Maheshwari, A. 2002. Bulk synchronous parallel algorithms for the external memory model. Theory Comput. Syst. 35, 567--597.
|
| |
9
|
|
| |
10
|
Goodrich, M. T., Tsay, J.-J., Vengroff, D. E., and Vitter, J. S. 1993. External-memory computational geometry. In Proceedings of the IEEE Symposium on Foundations of Computer Science. (Palo Alto, CA) IEEE Computer Society Press, Los Alamitos, CA, 714--723.
|
| |
11
|
Henzinger, M. R., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Tech. Rep. 1998--011, Digital Equipment Corporation Systems Research Center, Palo Alto, CA.
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
Peter Sanders , Sebastian Egner , Jan Korst, Fast concurrent access to parallel disks, Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, p.849-858, January 09-11, 2000, San Francisco, California, United States
|
| |
17
|
TPIE 1999. TPIE user manual and reference. The manual and software distribution are available on the web at http://www.cs.duke.edu/TPIE/.
|
| |
18
|
Vengroff, D. E. 1994. A transparent parallel I/O environment. In Proceedings of the DAGS Symposium on Parallel Computation (Hanover, NH). 117--134.
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
Vitter, J. S., and Shriver, E. A. M. 1994. Algorithms for parallel memory I: Two-level memories. Algorithmica 12, 2--3, 110--147.
|
INDEX TERMS
Primary Classification:
E.
Data
E.1
DATA STRUCTURES
Subjects:
Distributed data structures
Additional Classification:
F.
Theory of Computation
F.2
ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
F.2.2
Nonnumerical Algorithms and Problems
Subjects:
Sorting and searching
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.2
Information Storage
Subjects:
File organization
General Terms:
Algorithms,
Design,
Experimentation,
Performance,
Theory
Keywords:
distribution,
external memory,
external sorting,
input/output,
merging,
multipass algorithms,
multiple disks,
parallel disks,
randomization,
sorting
|