ACM Home Page
Please provide us with feedback. Feedback
RAID: high-performance, reliable secondary storage
Full text PdfPdf (3.60 MB)
Source ACM Computing Surveys (CSUR) archive
Volume 26 ,  Issue 2  (June 1994) table of contents
Pages: 145 - 185  
Year of Publication: 1994
ISSN:0360-0300
Authors
Peter M. Chen  Univ. of Michigan, Ann Arbor
Edward K. Lee  DEC Systems Research Center, Palo Alto, CA
Garth A. Gibson  Carnegie Mellon Univ., Pittsburgh, PA
Randy H. Katz  Univ. of California, Berkeley
David A. Patterson  Univ. of California, Berkeley
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 70,   Downloads (12 Months): 595,   Citation Count: 175
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/176979.176981
What is a DOI?

ABSTRACT

Disk arrays were proposed in the 1980s as a way to use parallelism between multiple disks to improve aggregate I/O performance. Today they appear in the product lines of most major computer manufacturers. This article gives a comprehensive overview of disk arrays and provides a framework in which to organize current and future work. First, the article introduces disk technology and reviews the driving forces that have popularized disk arrays: performance and reliability. It discusses the two architectural techniques used in disk arrays: striping across multiple disks to improve performance and redundancy to improve reliability. Next, the article describes seven disk array architectures, called RAID (Redundant Arrays of Inexpensive Disks) levels 0–6 and compares their performance, cost, and reliability. It goes on to discuss advanced research and implementation topics such as refining the basic RAID levels to improve performance and designing algorithms to maintain data consistency. Last, the article describes six disk array prototypes of products and discusses future opportunities for research, with an annotated bibliography disk array-related literature.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
AMDAHL, G.M. 1967. ValiditK, of the single processor approach to achievin'g large scale computing capabilities. In Proceedtngs of the AFIPS 1967 Sprzng Joint Computer Conference. Vol. 30. AFIPS, Washington, D.C., 483-485. Threepage paper that eloquently gives case for traditional computers by pointing out that performance improvement is limited by portion of the computation that is not improved.
 
2
BACCELLI, F. 1985. Two parallel queues created by arrivals with two demands. Tech Rep. 426, INRIA, Rocquencourt, France. Derives an exact solution for the two-server, M/G/1 fork-join queue.
 
3
BHIDE, A. AND DIAS, D. 1992. Raid architectures for OLTP. Tech. Rep. RC 17879 (#78489), IBM, Yorktown Heights, N.Y. Increases throughput for workloads emphasizing small, random write accesses in a redundant disk array by logging changes to parity for efficient application later. Parity changes are logged onto a separate disk which must be externally sorted before application to the disk array's parity.
 
4
 
5
BURKU^RDT, W. AND MENON, J. 1993. Disk array storage system reliability. In the 23rd Annual International Symposium on Fault-Tolerant Computing. IEEE Computer Society, Washington, D.C., 432 441. Argnaes need for multiple error-correcting disk arrays; discusses how to use maximal-distance separable codes to protect against multiple errors in a space-efficient manner.
 
6
BUZEN, J. AND SHUM, W. C. 1986. I/O architecture in MVS/370 and MVS/XA. CMG Trans. 54 (Fall), 19-26. Overview of the MVS/370 and MVS/XA I/O architecture. Describes channel paths, storage directors, string controllers, rotational position sensing, static and dynamic reconnect.
7
 
8
CHANDY, J. AND REDDY, A. L. N. 1993. Failure evaluation of disk array organizations. In Proceedmgs of the International Conference on Distrtbuted Computing Systems. IEEE Computer Society, Washington, D.C. Contrasts four previously described schemes for minimizing data reconstruction time in small (7 and 16 disks) redundant disk arrays: RAID 5 with a single spare disk, RAID 5 with a single spare whose space is distributed across all disks, a special case of Muntz and Lui's parity-clustering organization, and a method of dynamically converting a redundant data disk to a spare disk by merging two redundancy groups into one larger group. The second, distributed sparing, is generally preferred because of its performance and simplicity, but the Muntz scheme is better for minimal impact of user performance during recovery.
9
10
 
11
C~tEN, S. AND TOWSLE~, D. 1991. A queuemg analysis of RAID architectures. Tech. Rep. COINS Tech. Rep. 91-71, Dept. of Computer and Information Science, Univ of Mas~ sachusetts, Amherst, Mass Analytically models RAID leveI-1 and RAID level-5 disk arrays to compare their performance on small and large requests. Bounds are used to model the quemng and fork-join synchronization in RAID level-1 disk arrays. Small write requests in RAID level-5 disk arrays are handled by ignoring the fork-join synchromzatlon overhead. Large requests are modcled by using a single queue for all the disks m the disk array.
 
12
CHEN, P. M AND LEE, E.K. 1993 Striping in a RAID level-5 disk array Tech. Rep. CSE-TR- 181-93, Univ. of' Michigan, Ann Arbor, Mtch. Discusses how to choose striping unit for RAID level-5 disk arrays. Quantifies effect of writes and varying number of dmks
 
13
14
15
16
 
17
EMLICH, L. W. AND POLICH, H.D. 1989 VAXmm- PLUS, a fault manager mlplementatmn Dig. Tech. J. 8, (Feb.) Describes Digatal Equipment Corporation's tool for predicting and awfiding disk failures
 
18
FLATTO, L. ~D HAHN, S. 1984. Two parallel queues created by arrivals with two demands I}. SIAM J. Comput. 44, 5 (Oct.), 1041-1053, Derives an exact solutmn for the two-server, M/M/l, fork-join queue.
 
19
FRmDMAN, M. B 1983. DASD access patterns. In the 14th International Conference on Managemeat ond Performance Evaluatmn of Computer Systems Computer Measurement Group, 51 61 Looks at how much disk accesses are skewed toward particular disks in several transaction-processing sites.
 
20
 
21
GIBSON, G. A., PATTERSON, R. H., AND SATYA- NARAYANAN, M. 1992. D~sk reads with DRAM latency In the 3rd Workshop on Workstatwn Operating Systems. IEEE Computer Society, Washing-ton, D.C. Proposes that apphcatmns give hints about their future file accesses so that the buffer cache can prefetch needed data and provide low-latency file access. The hints could be exploited also to improve cache management and dmk scheduling
 
22
 
23
 
24
HEII)ELBERGER, P AND TRIVEDI, K.S. 1982. Qeuelag network models for parallel processing with asynchronous tasks. IEEE Trans Comput C- 3 l, 11 (Nov.), 1099-1109. Derives approximate solutions for queuing systems with forks but no joins.
 
25
26
 
27
HOLLAND, M. GmsoN, G., AND SIEWlOREK, D. 1993 Fast, on-line failure recovery in redundant disk arrays. In Proceedings of the 23rd International Symposium on Fault Tolerant Computing IEEE Computer Society, Washington, D,C. ompares and contrasts two data reconstruction algorithms for disk arrays: "parallel stripe-oriented reconstruction" and "disk-oriented reconstruction." Presents an implementation of the disk-oriented algorithm and analyzes reconstruction performance of these algorithms, concluding that the disk-oriented algorithm is superior. Investigates the sensitivity of the reconstruction process to the size of the reconstruction umt and the amount of memory available for reconstruction.
 
28
 
29
KATZ, R.H. 1992. High perfbrmance networkand channel-based storage. Proc. IEEE 80, 8 (Aug.), 1238 1261. Presents overview of network-based storage systems. Reviews hardware and software trends in storage systems.
 
30
 
31
 
32
 
33
KORNER, K. 1990. Intelligent caching for remote file service. In Proceedings of the Internatmnal Conference on Distributed Computing Systems. IEEE Computer Society, Washington, D.C., 220 226. Uses traces to generate hints based on the program running and the directory and name of files accessed. The file server uses the hints to pick a caching algorithm: LRU, MRU, none. Simulation showed significant benefits from intelligent caching but not from readahead which delayed demand requests since it was not preemptable.
 
34
 
35
36
37
38
 
39
LOVERSO, S. J., ISMAN, M., AND NANOPOULOS, A. 1993. A Parallel file system for the CM-5. In Proceedings of the USENIX Summer Con/krence. USENIX Assoclatmn, Berkeley, Calif. A description of the I/O hardware and the file system of the massively parallel processor from Thinking Machines. Their RAID-3 disk array has excellent performance for large file accesses.
 
40
41
 
42
MENON, J., MATTSON, D., AND NO, S. 1991. Distributed sparing for improved performance of disk arrays. Tech Rep. RJ 7943, IBM, Almaden Research Center. Explores the use of an on-line spare disk in a redundant disk array analytically It examines multiple configurations, but fundamentally it distributes the spare's space over the whole array so that every disk is N/(N + 2) data, 1/(N + 2) parity, and 1/(N + 2) spare. This gives an extra 1/(N + 2) performance, but, more significantly, it distributes the recovery-write load (the reconstructed data) over all disks to shorten recovery time. The benefits, not surprisingly, are largest for small arrays.
 
43
 
44
MERCHANT, A. AND Yu, P. 1992. Design and modeling of clustered RAID. In Proceedings of the International Symposium on Fault Tolerant Computing. IEEE Computer Society, Washington, D.C., 140 149. Presents an implementation of parity declustering, which the authors call "clustered RAID," based on random permutations. Its advantage is that it ~s able to derive a data mapping for any size disk array with any size parity stripe, and the corresponding disadvantage is that the computational requirements of the mapping algorithm are high compared to the block-design-based approaches. Analyzes response time and reconstruction time using this technique via an analytic model, and finds substantial benefits m bath.
 
45
MONTGOMERY SECURITIES 1991. RAID: A technology prosed for explosive growth. Tech. Rep. DJIA: 2902, Montgomery Securities, San Franmsco, Calif. Industry projections of market growth for RAID systems from 1990 to 1995.
 
46
 
47
48
 
49
NG, S AND MATTSON, D. 1991 Maintaining good performance in disk arrays during failure via uniform parity group distribution. In Proceed- ,~gs of t}le 1st Internatmnal Symposium on High Performance Distrzbuted Computing. 260 269 Uses balanced, incomplete block designs to distribute the extra load from a failed disk equally among other disks in the array.
50
 
51
52
53
 
54
PETERSON, E. W. AND WELDON, E. J. 1972. Error-Correcting Codes. 2nd ed. MIT Press, Cambridge, Mass. A general textbook on the mathematics of error-correcting codes.
55
 
56
 
57
SCHEUERMANN, P., WEIKUM, G., AND ZABBACK, P. 1991. Automatic tuning of data placement and load balancing in disk arrays. Database Systems for Next-Generation Applications: Principles and Practme. Describes heuristics for allocating files to disks to minimize disk skew.
 
58
SCHULZE, M., GIBSON, G. KATZ, R., AND PATTERSON, D. 1989. How reliable is a RAID, In Procedures of the IEEE Computer Society International Conference (COMPCON). IEEE, New York. Gives a reliability calculation for the electronics as well as the disks for RAIDS.
 
59
SELTZER, M. I., CHEN, P. M., AND OUSTERHOUT, J. K. 1990. Disk scheduling revisited. In Proceedings of the Winter 1990 USENIX Technical Conference. USENIX Association, Berkeley, Calif. 313-324. Reexamines the problem of how to efficiently schedule a large number of disk accesses when accounting for both seek and rotational positioning delays.
60
 
61
TAIT, C. D. AND DUCHAMP, D. 1991. Detection and exploitation of file working sets. In Proceedings of the International Conference on Distributed Computing Systems. IEEE Computer Society, Washington, D.C., 2 9. Dynamically builds and maintains program and data access trees to predict future file accesses. The current pattern is matched with previous trees to prefetch data and manage the local cache in a distributed file system. Trace-driven simulation shows reduced cache miss rates over a simple LRU algorithm.
 
62
WEIKUM, G. AND ZABBACK, P. 1992. Tuning of striping units in disk-array-based file systems. In Proceedings of the 2nd Internatzonal Workshop on Research Issues on Data Engineering: Transaction and Query Processing. IEEE Computer Society, Washington, D.C., 80 87. Proposes file-specific striping units instead of a single, global one for all files.
 
63
WILMOT, R. B. 1989. File usage patterns from SMF data: Highly skewed usage. In the 20th Internatwnal Conference on Management and Performance Evaluation of Computer Systems. Computer Measurement Group, 668-677. Reports on how files are accessed on four large data centers and finds that a small number of files account for most of all disk I/O.

CITED BY  175


REVIEW

"Antonio Puliafito : Reviewer"

To improve the performance and reliability of mass storage units, I/O subsystems based on sets of physically distinct but logically interconnected units (disk array systems) have been proposed to extend the cost, power, and space advantages of  more...

Collaborative Colleagues:
Peter M. Chen: colleagues
Edward K. Lee: colleagues
Garth A. Gibson: colleagues
Randy H. Katz: colleagues
David A. Patterson: colleagues