|
ABSTRACT
This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability. FAB is built from a collection of bricks, small storage appliances containing commodity disks, CPU, NVRAM, and network interface cards. FAB deploys a new majority-voting-based algorithm to replicate or erasure-code logical blocks across bricks and a reconfiguration algorithm to move data in the background when bricks are added or decommissioned. We argue that voting is practical and necessary for reliable, high-throughput storage systems such as FAB. We have implemented a FAB prototype on a 22-node Linux cluster. This prototype sustains 85MB/second of throughput for a database workload, and 270MB/second for a bulk-read workload. In addition, it can outperform traditional master-slave replication through performance decoupling and can handle brick failures and recoveries smoothly without disturbing client requests.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Atul Adya , William J. Bolosky , Miguel Castro , Gerald Cermak , Ronnie Chaiken , John R. Douceur , Jon Howell , Jacob R. Lorch , Marvin Theimer , Roger P. Wattenhofer, Farsite: federated, available, and reliable storage for an incompletely trusted environment, Proceedings of the 5th symposium on Operating systems design and implementation Due to copyright restrictions we are not able to make the PDFs for this conference available for downloading, December 09-11, 2002, Boston, Massachusetts
[doi> 10.1145/1060289.1060291]
|
| |
2
|
Marcos K. Aguilera and Svend Frolund. Strict linearizability and the power of aborting. Technical Report HPL-2003-241, HP Labs, December 2003.
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
Peter M. Chen , Edward K. Lee , Garth A. Gibson , Randy H. Katz , David A. Patterson, RAID: high-performance, reliable secondary storage, ACM Computing Surveys (CSUR), v.26 n.2, p.145-185, June 1994
[doi> 10.1145/176979.176981]
|
| |
8
|
Flaviu Christian and Frank Schmuck. Agreeing on processor group membership in asynchronous distributed systems. Technical Report CSE95-428, UC San Diego, 1995.
|
| |
9
|
Storage Performance Council. SPC Benchmark 1 specification. http://www.storageperformance.org/, 2003.
|
| |
10
|
S. Frolund, A. Merchant, Y. Saito, S. Spence, and A. Veitch. FAB: Enterprise storage systems on a shoestring. In 8th Workshop on Hot Topics in Operating Systems (HOTOS-VIII), pages 169--174, Kauai, HI, USA, May 2003.
|
| |
11
|
|
| |
12
|
Gregory R. Ganger, John D. Strunk, and Andrew J. Klosterman. Self-* storage: Brick-based storage with automated administration. Technical Report CMU-CS-03-178, Carnegie Mellon University, August 2003.
|
 |
13
|
Garth A. Gibson , David F. Nagle , Khalil Amiri , Jeff Butler , Fay W. Chang , Howard Gobioff , Charles Hardin , Erik Riedel , David Rochberg , Jim Zelenka, A cost-effective, high-bandwidth storage architecture, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.92-103, October 02-07, 1998, San Jose, California, United States
|
 |
14
|
|
| |
15
|
Douglas Gilbert. The Linux SCSI generic HOWTO. http://www.torque.net/sg/p/sg v3 ho.html, 2003.
|
| |
16
|
Garth R. Goodson, Jay J. Wylie, Gregory R. Ganger, and Michael K. Reiter. Efficient consistency for erasure-coded data via versioning servers. Technical Report CMU-CS-03-127, Carnegie Mellon University, April 2003.
|
 |
17
|
|
| |
18
|
Andy Huang and Armando Fox. Dstore: self-managing, crash-only persistent hash table. http://swig.stanford.edu/public/projects/dstore/, 2004.
|
| |
19
|
IBM. IceCube: storage server for the Internet age. http://www.almaden.ibm.com/cs/storagesystems/IceCube/, 2003.
|
 |
20
|
|
| |
21
|
Leslie Lamport. Paxos made simple. ACM SIGACT News, 32(4):18--25, December 2001.
|
 |
22
|
|
| |
23
|
LeftHand Networks. IP-based storage area networks. http://www.lefthandnetworks.com/downloads/ip-san wp.pdf, 2002.
|
| |
24
|
Benjamin C. Ling, Emre Kiciman, and Armando Fox. Session state: beyond soft state. In 1st Symp. on Network Sys. Design and Impl. (NSDI), pages 295--308, San Francisco, CA, USA, March 2004.
|
 |
25
|
|
 |
26
|
Danny Dolev , Idit Keidar , Esti Yeger Lotem, Dynamic voting for consistent primary components, Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing, p.63-71, August 21-24, 1997, Santa Barbara, California, United States
[doi> 10.1145/259380.259424]
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
Sean Reah, Patrik Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz. Pond: the OceanStore prototype. In USENIX Conf. on File and Storage Technologies (FAST), pages 1--14, San Francisco, CA, March 2003.
|
| |
32
|
Julian Satran, Kalman Meth, Constantine Sapuntzakis, Mallikarjun Chadalapaka, and Efri Zeidner. RFC3720: Internet small computer systems interface (iSCSI). http://www.faqs.org/rfcs/rfc3720.html, 2004.
|
| |
33
|
Josh Tseng, Kevin Gibbons, Franco Travostino, Curt Du Laney, and Joe Souza. Internet storage name service (iSNS), draft version 18. http://www.diskdrive.com/reading-room/standards.html, March 2003.
|
| |
34
|
Carl A. Waldspurger and William E. Weihl. Lottery scheduling: Flexible propotional-share resource management. In 1st Symp. on Op. Sys. Design and Impl. (OSDI), pages 1--11, Monterey, CA, USA, November 1994.
|
| |
35
|
Avishai Wool. Quorum systems in replicated databases: science or fiction? Bull. IEEE Technical Committee on Data Engineering, 21(4):3--11, December 1998.
|
CITED BY 20
|
|
Patrick Reynolds , Janet L. Wiener , Jeffrey C. Mogul , Mehul A. Shah , Charles Killian , Amin Vahdat, Experiences with Pip: finding unexpected behavior in distributed systems, Proceedings of the twentieth ACM symposium on Operating systems principles, October 23-26, 2005, Brighton, United Kingdom
|
|
|
|
|
|
W. W. Wilcke , R. B. Garner , C. Fleiner , R. F. Freitas , R. A. Golding , J. S. Glider , D. R. Kenchammana-Hosekote , J. L. Hafner , K. M. Mohiuddin , K. K. Rao , R. A. Becker-Szendy , T. M. Wong , O. A. Zaki , M. Hernandez , K. R. Fernandez , H. Huels , H. Lenk , K. Smolin , M. Ries , C. Goettert , T. Picunko , B. J. Rubin , H. Kahn , T. Loo, IBM intelligent Bricks project: petabytes and beyond, IBM Journal of Research and Development, v.50 n.2/3, p.181-197, March 2006
|
|
|
C. Fleiner , R. B. Garner , J. L. Hafner , K. K. Rao , D. R. Kenchammana-Hosekote , W. W. Wilcke , J. S. Glider, Reliability of modular mesh-connected intelligent storage brick systems, IBM Journal of Research and Development, v.50 n.2/3, p.199-208, March 2006
|
|
|
Mark W. Storer , Kevin M. Greenan , Ethan L. Miller , Kaladhar Voruganti, Pergamum: replacing tape with energy efficient, reliable, disk-based archival storage, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-16, February 26-29, 2008, San Jose, California
|
|
|
John D. Strunk , Eno Thereska , Christos Faloutsos , Gregory R. Ganger, Using utility to provision storage systems, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-16, February 26-29, 2008, San Jose, California
|
|
|
|
|
|
Sage A. Weil , Scott A. Brandt , Ethan L. Miller , Carlos Maltzahn, Grid resource management---CRUSH: controlled, scalable, decentralized placement of replicated data, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
|
|
|
|
|
|
|
|
|
|
|
|
Gregory Chockler , Seth Gilbert , Vincent Gramoli , Peter M. Musial , Alex A. Shvartsman, Reconfigurable distributed storage for dynamic networks, Journal of Parallel and Distributed Computing, v.69 n.1, p.100-116, January, 2009
|
|
|
|
|
|
Sage A. Weil , Andrew W. Leung , Scott A. Brandt , Carlos Maltzahn, RADOS: a scalable, reliable storage service for petabyte-scale storage clusters, Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07, November 11-11, 2007, Reno, Nevada
|
|
|
Kevin M. Greenan , Ethan L. Miller , Thomas J. E. Schwarz , Darrell D.E. Long, Disaster recovery codes: increasing reliability with large-stripe erasure correcting codes, Proceedings of the 2007 ACM workshop on Storage security and survivability, October 29-29, 2007, Alexandria, Virginia, USA
|
|
|
Sage A. Weil , Scott A. Brandt , Ethan L. Miller , Darrell D. E. Long , Carlos Maltzahn, Ceph: a scalable, high-performance distributed file system, Proceedings of the 7th symposium on Operating systems design and implementation, November 06-08, 2006, Seattle, Washington
|
|
|
Giuseppe DeCandia , Deniz Hastorun , Madan Jampani , Gunavardhan Kakulapati , Avinash Lakshman , Alex Pilchin , Swaminathan Sivasubramanian , Peter Vosshall , Werner Vogels, Dynamo: amazon's highly available key-value store, ACM SIGOPS Operating Systems Review, v.41 n.6, December 2007
|
|
|
|
|
|
|
|
|
|
REVIEW
"Elliot Jaffe : Reviewer"
In recent years, file system research has focused on using the massive quantities of commodity disk space now residing in each desktop personal computer (PC). Desktops are frequently shut down or rebooted, and, hence, significant research has focu
more...
|