|
ABSTRACT
Configuring redundant disk arrays is a black art. To configure an array properly, a system administrator must understand the details of both the array and the workload it will support. Incorrect understanding of either, or changes in the workload over time, can lead to poor performance. We present a solution to this problem: a two-level storage hierarchy implemented inside a single disk-array controller. In the upper level of this hierarchy, two copies of active data are stored to provide full redundancy and excellent performance. In the lower level, RAID 5 parity protection is used to provide excellent storage cost for inactive data, at somewhat lower performance. The technology we describe in this article, know as HP AutoRAID, automatically and transparently manages migration of data blocks between these two levels as access patterns change. The result is a fully redundant storage system that is extremely easy to use, is suitable for a wide variety of workloads, is largely insensitive to dynamic workload changes, and performs much better than disk arrays with comparable numbers of spindles and much larger amounts of front-end RAM cache. Because the implementation of the HP AutoRAID technology is almost entirely in software, the additional hardware cost for these benefits is very small. We describe the HP AutoRAID technology in detail, provide performance data for an embodiment of it in a storage array, and summarize the results of simulation studies used to choose algorithms implemented in the array.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Mary Baker , Satoshi Asami , Etienne Deprit , John Ouseterhout , Margo Seltzer, Non-volatile memory for fast, reliable file systems, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.10-22, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
3
|
B~CKWELL, T., HAgmS, J., AND SELTZER, M. 1995. Heuristic cleaning algorithms in log-structured file systems. In Proceedings of USENIX 1995 Technical Conference on UNIX and Advanced Computing Systems. USENIX Assoc., Berkeley, Calif., 277-288.
|
| |
4
|
BURKES. T., D~MOND, B., AND Vomw, D. 1995. Adaptive hierarchical RAID: A solution to the RAID 5 write problem. Part No. 5963-9151, Hewlett-Packard Storage Systems Division, Boise, Idaho.
|
 |
5
|
Michael Burrows , Charles Jerian , Butler Lampson , Timothy Mann, On-line data compression in a log-structured file system, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.2-9, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
6
|
|
| |
7
|
CARSON, S. AND SETIA, S. 1992. Optimal write batch size in log-structured file systems. In USENIX Workshop on File Systems. USENIX Assoc., Berkeley, Calif., 79-91.
|
| |
8
|
CATE, V. 1990. Two levels of file system hierarchy on one disk. Tech. Pep. CMU-CS-90-129, Dept. of Computer Science, Carnegie-Mellon Univ., Pittsburgh, Pa.
|
| |
9
|
CHAO, C., ENOLISH, R., JACOBSON, D., STEP~'~OV, A., AND WILKES, J. 1992. Mime: A high performance storage device with strong recovery guarantees. Tech. Pep. HPL-92-44, Hewlett- Packard Laboratories, Palo Alto, Calif.
|
| |
10
|
CHEN, P. 1973. Optimal file allocation in multi-level storage hierarchies. In Proceedings of National Computer Conference and Exposition. AFIPS Conference Proceedings, vol. 42. AFIPS Press, Montvale, N.J., 277-282.
|
| |
11
|
CHEN, P. M. AND LEg, E. IC 1993. Striping in a RAID level-5 disk array. Tech. Pep. CSE-TR- 181-93, The Univ. of Michigan, Ann Arbor, Mich.
|
 |
12
|
Peter M. Chen , Edward K. Lee , Garth A. Gibson , Randy H. Katz , David A. Patterson, RAID: high-performance, reliable secondary storage, ACM Computing Surveys (CSUR), v.26 n.2, p.145-185, June 1994
[doi> 10.1145/176979.176981]
|
| |
13
|
CLi~e_~, F. W., HO, G. S.-F., KUSMER, S. R., ~ So~rrAo, J. R. 1986. The HP-UX operating system on HP Precision Architecture computers. Hewlett-Packard J. 37, 12 (Dec.), 4-22.
|
| |
14
|
|
| |
15
|
DEC. 1993. POLYCENTER Storage Management for OpenVMS VAX Systems. Digital Equipment Corp., Maynard, Mass.
|
 |
16
|
Wiebren de Jonge , M. Frans Kaashoek , Wilson C. Hsieh, The logical disk: a new approach to improving file systems, Proceedings of the fourteenth ACM symposium on Operating systems principles, p.15-28, December 05-08, 1993, Asheville, North Carolina, United States
|
| |
17
|
DESH?^NDE, M. B. AND BIYNT, R.B. 1988. Dynamic file management techniques. In Proceedings of the 7th IEEE Phoenix Conference on Computers and Communication. IEEE, New York, 86-92.
|
| |
18
|
Du~PHY, R. H., JR., W~n, R., BOWERS, J. H., ANo RUOESr, AL, G.A. 1991. Disk drive memory. U.S. Patent 5,077,736, U.S. Patent Office, Washington, D.C.
|
| |
19
|
ENaLISH, R. M. AND S~PANOV, A.A. 1992. Loge: A self-organizing storage device. In Proceedings of USENIX Winter '92 Technical Conference. USENIX Assoc., Berkeley, Calif., 237-251.
|
| |
20
|
EPOCH SYSTEMS. 1988. Mass storage: Server puts optical discs on line for workstations. Electronics (Nov.).
|
| |
21
|
EwI~G, J. 1993. RAID: An overview. Part No. W 17004-A 09/93, Storage Technology Corp., Louisville, Colo. Available as http://www.stortek.com:80/StorageTek/raid.html.
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
GOLOING, R., STA~Ln~, C., SULLIVAN, T., AND WILKES, J. 1994. "Tcl cures 98.3% of all known simulation configuration problems" claims astonished researcher{ In Proceedings of Tcl/Tk Workshop. Available as Tech. Pep. HPL-CCD-94-11, Concurrent Computing Dept., Hewlett- Packard Laboratories, Palo Alto, Calif.
|
| |
26
|
GOLDINO, R., BOSCH, P., STAELIN, C., SULLIVAN, T., AND WILKI~S, J. 1995. Idleness is not sloth. In Proceedings of USENIX 1995 Technical Conference on UNIX and Advanced Computing Systems. USENIX Assoc., Berkeley, Calif., 201-212.
|
| |
27
|
GgAY, J. 1990. A census of Tandem system availability between 1985 and 1990. Tech. Pep. 90.1, Tandem Computers Inc., Cupertino, Calif.
|
| |
28
|
HENDERSON, R. L. AND POSTON, A. 1989. MSS-II and RASH: A mainframe Unix based mass storage system with a rapid access storage hierarchy file management system. In Proceedings of USENIX Winter 1989 Conference. USENIX Assoc., Berkeley, Calif., 65-84.
|
 |
29
|
|
| |
30
|
JACOBSON, D. M. AND WILKES, J. 1991. Disk scheduling algorithms based on rotational position. Tech. Pep. HPL-CSP-91-7, Hewlett-Packard Laboratories, Palo Alto, Calif.
|
| |
31
|
|
| |
32
|
KO}{L, J. T., STAEL}N, C., AND STON~BRAK~R, M. 1993. Highlight: Using a log-structured file system for tertiary storage management. In Proceedings of Winter 1993 USENIX. USENIX Assoc., Berkeley, Calif., 435-447.
|
| |
33
|
LAWt. OR, F. D. 1981. Efficient mass storage parity recovery mechanism. IBM Tech. Discl. Bull. 24, 2 (July), 986-987.
|
| |
34
|
MAJUMDAR, S. 1984. Locality and file referencing behaviour: Principles and applications. MSc. thesis, Tech. Pep. 84-14, Dept. of Computer Science, Univ. of Saskatchewan, Saskatoon, Saskatchewan, Canada.
|
| |
35
|
McDONALD, M. S. AND BUNT, R.B. 1989. Improving file system performance by dynamically restructuring disk space. In Proceedings of Phoenix Conference on Computers and Communication. IEEE, New York, 264-269
|
| |
36
|
|
 |
37
|
|
| |
38
|
MENON, d. AND KASSON, J. 1989. Methods for improved update performance of disk arrays. Tech. Pep. RJ 6928 (66034), IBM Almaden Research Center, San Jose, Calif. Declassified Nov. 21, 1990.
|
| |
39
|
MENON, J. ANn KASSON, J. 1992. Methods for improved update performance of disk arrays. In Proceedings of 25th International Conference on System Sciences. Vol. 1. IEEE, New York, 74-83.
|
 |
40
|
|
| |
41
|
|
| |
42
|
MISRA, P.N. 1981. Capacity analysis of the mass storage system. IBM Syst. J. 20, 3, 346-361.
|
| |
43
|
|
 |
44
|
|
| |
45
|
|
| |
46
|
PARK, A. ANn BALASURRAMANIAN, K. 1986. Providing fault tolerance in parallel secondary storage systems. Tech. Pep. CS-TR-057-86, Dept. of Computer Science, Princeton Univ., Princeton, N.J.
|
| |
47
|
PATTER.~ON, D. A., CHEN, P., GIBSON, G., AND KATZ, R. H 1989. Introduction to redundant arrays of inexpensive disks (RAID). In Spring COMPCON '89. IEEE, New York, 112-117.
|
 |
48
|
David A. Patterson , Garth Gibson , Randy H. Katz, A case for redundant arrays of inexpensive disks (RAID), Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.109-116, June 01-03, 1988, Chicago, Illinois, United States
|
 |
49
|
|
| |
50
|
RUEMMLER, C. AND WILKES, J. 1991. Disk shuffling. Tech. Rep. HPL-91-156, Hewlett-Packard Laboratories, Palo Alto, Calif.
|
| |
51
|
RUEMMLER, C. AND WILKES, J. 1993. UNIX disk access patterns. In Proceedings of the Winter 1993 USENIX Conference. USENIX Assoc., Berkeley, Calif., 405-420.
|
| |
52
|
|
| |
53
|
SCSI. 1991. Draft proposed American National Standard for information systems--Small Computer System Interface-2 (SCSI-2). Draft ANSI standard X3T9.2/86-109, (revision 10d). Secretariat, Computer and Business Equipment Manufacturers Association.
|
| |
54
|
SELTZER, M., BOST}C, K., McKus}cK, M. K., AND STAELIN, C. 1993. An implementation of a log-structured file system for UNIX. In Proceedings of the Winter 1993 USENIX Conference. USENIX Assoc., Berkeley, Calif., 307-326.
|
| |
55
|
SELTZER, M., CHEN, P., AND OUSTERHOUT, J. 1990. Disk scheduling revisited. In Proceedings o! the Winter 1990 USENIX Conference. USENIX Assoc., Berkeley, Calif., 313-323.
|
| |
56
|
SELTZER, M., SMITH, K. A., BALAKRISHNAN, H., CHANG, J., McMAINS, S., AND PADMANABHAN, V. 1995. File system logging versus clustering: A performance comparison. In Conference Proceedings of USENIX 1995 Technical Conference on UNIX and Advanced Computing Systems. USENIX Assoc., Berkeley, Calif., 249-264.
|
| |
57
|
|
| |
58
|
SMITH, A.J. 1981. Optimization of I/O systems by cache disks and file migration: A summary. Perf. Eval. 1,249-262.
|
| |
59
|
STK. 1995. Iceberg 9200 disk array subsystem. Storage Technology Corp., Louisville, Colo. Available as http://www.stortek,com:80/StorageTek/iceberg.html.
|
| |
60
|
TAUNTON, M. 1991. Compressed executables: An exercise in thinking small. In Proceedings o/ Summer USENIX. USENIX Assoc., Berkeley, Calif., 385-403.
|
CITED BY 64
|
|
|
|
|
E. Borowsky , R. Golding , P. Jacobson , A. Merchant , L. Schreier , M. Spasojevic , J. Wilkes, Capacity planning with phased workloads, Proceedings of the 1st international workshop on Software and performance, p.199-207, October 12-16, 1998, Santa Fe, New Mexico, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anindya Neogi , Ashish Raniwala , Tzi-cker Chiueh, Phoenix: a low-power fault-tolerant real-time network-attached storage device, Proceedings of the seventh ACM international conference on Multimedia (Part 1), p.447-456, October 30-November 05, 1999, Orlando, Florida, United States
|
|
|
Sung Hoon Baek , Bong Wan Kim , Eui Joung Joung , Chong Won Park, Reliability and performance of hierarchical RAID with multiple controllers, Proceedings of the twentieth annual ACM symposium on Principles of distributed computing, p.246-254, August 2001, Newport, Rhode Island, United States
|
|
|
|
|
|
John Kubiatowicz , David Bindel , Yan Chen , Steven Czerwinski , Patrick Eaton , Dennis Geels , Ramakrishna Gummadi , Sean Rhea , Hakim Weatherspoon , Chris Wells , Ben Zhao, OceanStore: an architecture for global-scale persistent storage, ACM SIGARCH Computer Architecture News, v.28 n.5, p.190-201, Dec. 2000
|
|
|
Brian S. White , Michael Walker , Marty Humphrey , Andrew S. Grimshaw, LegionFS: a secure and scalable file system supporting cross-domain high-performance applications, Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), p.59-59, November 10-16, 2001, Denver, Colorado
|
|
|
Fay Chang , Minwen Ji , Shun-Tak Leung , John MacCormick , Sharon Perl , Li Zhang, Myriad: Cost-effective Disaster Tolerance, Proceedings of the 1st USENIX Conference on File and Storage Technologies, January 28-30, 2002, Monterey, CA
|
|
|
|
|
|
|
|
|
|
|
|
Jiri Schindler , Steven W. Schlosser , Minglong Shao , Anastassia Ailamaki , Gregory R. Ganger, Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
|
|
|
|
|
|
|
|
Yuanyuan Zhou , Angelos Bilas , Suresh Jagannathan , Dimitrios Xinidis , Cezary Dubnicki , Kai Li, VI-Attached Database Storage, IEEE Transactions on Parallel and Distributed Systems, v.16 n.1, p.35-50, January 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau , Lakshmi N. Bairavasundaram , Timothy E. Denehy , Florentina I. Popovici , Vijayan Prabhakaran , Muthian Sivathanu, Semantically-smart disk systems: past, present, and future, ACM SIGMETRICS Performance Evaluation Review, v.33 n.4, March 2006
|
|
|
Andrew Krioukov , Lakshmi N. Bairavasundaram , Garth R. Goodson , Kiran Srinivasan , Randy Thelen , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dussea, Parity lost and parity regained, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-15, February 26-29, 2008, San Jose, California
|
|
|
|
|
|
|
|
|
Gaurav Mathur , Peter Desnoyers , Deepak Ganesan , Prashant Shenoy, Capsule: an energy-optimized object storage system for memory-constrained sensor devices, Proceedings of the 4th international conference on Embedded networked sensor systems, October 31-November 03, 2006, Boulder, Colorado, USA
|
|
|
|
|
|
Muthian Sivathanu , Lakshmi N. Bairavasundaram , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, Life or death at block-level, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p.26-26, December 06-08, 2004, San Francisco, CA
|
|
|
Muthian Sivathanu , Lakshmi N. Bairavasundaram , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, Database-aware semantically-smart storage, Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, p.18-18, December 13-16, 2005, San Francisco, CA
|
|
|
Xiang Yu , Benjamin Gum , Yuqun Chen , Randolph Y. Wang , Kai Li , Arvind Krishnamurthy , Thomas E. Anderson, Trading capacity for performance in a disk array, Proceedings of the 4th conference on Symposium on Operating System Design & Implementation, p.17-17, October 22-25, 2000, San Diego, California
|
|
|
Michael Abd-El-Malek , William V. Courtright, II , Chuck Cranor , Gregory R. Ganger , James Hendricks , Andrew J. Klosterman , Michael Mesnier , Manish Prasad , Brandon Salmon , Raja R. Sambasivan , Shafeeq Sinnamohideen , John D. Strunk , Eno Thereska , Matthew Wachs , Jay J. Wylie, Ursa minor: versatile cluster-based storage, Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, p.5-5, December 13-16, 2005, San Francisco, CA
|
|
|
Christopher R. Lumb , Jiri Schindler , Gregory R. Ganger , David F. Nagle , Erik Riedel, Towards higher disk head utilization: extracting free bandwidth from busy disk drives, Proceedings of the 4th conference on Symposium on Operating System Design & Implementation, p.7-7, October 22-25, 2000, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eric Anderson , Michael Hobbs , Kimberly Keeton , Susan Spence , Mustafa Uysal , Alistair Veitch, Hippodrome: Running Circles Around Storage Administration, Proceedings of the 1st USENIX Conference on File and Storage Technologies, January 28-30, 2002, Monterey, CA
|
|
|
Eric Anderson , Ram Swaminathan , Alistair Veitch , Guillermo A. Alvarez , John Wilkes, Selecting RAID Levels for Disk Arrays, Proceedings of the 1st USENIX Conference on File and Storage Technologies, January 28-30, 2002, Monterey, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ji-Yong Shin , Zeng-Lin Xia , Ning-Yi Xu , Rui Gao , Xiong-Fei Cai , Seungryoul Maeng , Feng-Hsiung Hsu, FTL design exploration in reconfigurable high-performance SSD for server applications, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
D.
Software
D.4
OPERATING SYSTEMS
D.4.2
Storage Management
Subjects:
Secondary storage
Additional Classification:
B.
Hardware
B.3
MEMORY STRUCTURES
B.3.2
Design Styles
Subjects:
Mass storage (e.g., magnetic, optical, RAID)
B.4
INPUT/OUTPUT AND DATA COMMUNICATIONS
B.4.2
Input/Output Devices
Subjects:
Channels and controllers
B.4.5
Reliability, Testing, and Fault-Tolerance**
Subjects:
Redundant design**
General Terms:
Algorithms,
Design,
Performance,
Reliability
Keywords:
RAID,
disk array,
storage hierarchy
|