|
ABSTRACT
The ideal storage system is globally accessible, always available, provides unlimited performance and capacity for a large number of clients, and requires no management. This paper describes the design, implementation, and performance of Petal, a system that attempts to approximate this ideal in practice through a novel combination of features. Petal consists of a collection of network-connected servers that cooperatively manage a pool of physical disks. To a Petal client, this collection appears as a highly available block-level storage system that provides large abstract containers called virtual disks. A virtual disk is globally accessible to all Petal clients on the network. A client can create a virtual disk on demand to tap the entire capacity and performance of the underlying physical resources. Furthermore, additional resources, such as servers and disks, can be automatically incorporated into Petal.We have an initial Petal prototype consisting of four 225 MHz DEC 3000/700 workstations running Digital Unix and connected by a 155 Mbit/s ATM network. The prototype provides clients with virtual disks that tolerate and recover from disk, server, and network failures. Latency is comparable to a locally attached disk, and throughput scales with the number of servers. The prototype can achieve I/O rates of up to 3150 requests/sec and bandwidth up to 43.1 Mbytes/sec.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Thomas E. Anderson , Michael D. Dahlin , Jeanna M. Neefe , David A. Patterson , Drew S. Roselli , Randolph Y. Wang, Serverless network file systems, ACM Transactions on Computer Systems (TOCS), v.14 n.1, p.41-79, Feb. 1996
[doi> 10.1145/225535.225537]
|
 |
2
|
|
 |
3
|
|
| |
4
|
Luis-Felipe Cabrera and Darrel D. E. Long. Swift: Using distributed disk striping to provide high I/O data rates. ACM Computing Systems, 4:405--436, Fall 1991.
|
 |
5
|
|
| |
6
|
C. Chao, R. English, D. Jacobson, A. Stepanov, and J. Wilkes. Mime: A high performance parallel storage device with strong recovery guarantees. Technical Report HPL-CSP-92- 9, Hewlett-Packard Laboratories, November 1992.
|
| |
7
|
Peter M. Chen , Edward K. Lee , Ann L. Drapeau , Ken Lutz , Ethan L. Miller , Srinivasan Seshan , Ken Shirriff , David A. Patterson , Randy H. Katz, Performance and design evaluation of the RAID-II storage server, Distributed and Parallel Databases, v.2 n.3, p.243-260, July 1994
[doi> 10.1007/BF01266330]
|
 |
8
|
Wiebren de Jonge , M. Frans Kaashoek , Wilson C. Hsieh, The logical disk: a new approach to improving file systems, Proceedings of the fourteenth ACM symposium on Operating systems principles, p.15-28, December 05-08, 1993, Asheville, North Carolina, United States
|
 |
9
|
Peter Druschel , Larry L. Peterson , Bruce S. Davie, Experiences with a high-speed network adaptor: a software perspective, Proceedings of the conference on Communications architectures, protocols and applications, p.2-13, August 31-September 02, 1994, London, United Kingdom
|
| |
10
|
R. M. English and A. A. Stepanov. Loge: A self-organizing disk controller. In Proceedings of the Winter 1992 USENIX Conference, pages 237-251, January 1992.
|
| |
11
|
Garth A. Gibson, David F. Nagle, KhaliI Amid, Fay W. Chang, Eugene Feinberg, Howard Gobioff, Chen Lee, Berend Ozceri, Erik Riedel, and David Rochberg. A case for network-attached secure disks. Technical Report CMU-CS- 96-142, Department of Electrical and Computer Engineering, Carnegie-Mellon University, June 1996.
|
 |
12
|
|
| |
13
|
Hui-I Hsiao and David J. DeWitt. Chained declustering: A new availability strategy for multiprocessor database machines. Technical Report CS TR 854, University of Wisconsin, Madison, June 1989.
|
| |
14
|
Leslie Lamport. The Part-Time Parliament. Technical Report 49, Digital Equipment Corporation, Systems Research Center, 130 Lytton Ave., Palo Alto, CA 94301-1044, September 1989.
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
J. Wilkes , R. Golding , C. Staelin , T. Sullivan, The HP AutoRAID hierarchical storage system, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.96-108, December 03-06, 1995, Copper Mountain, Colorado, United States
|
CITED BY 99
|
|
|
|
Jay J. Wylie , Michael W. Bigrigg , John D. Strunk , Gregory R. Ganger , Han Kiliççöte , Pradeep K. Khosla, Survivable Information Storage Systems, Computer, v.33 n.8, p.61-68, August 2000
|
|
Anindya Neogi , Ashish Raniwala , Tzi-cker Chiueh, Phoenix: a low-power fault-tolerant real-time network-attached storage device, Proceedings of the seventh ACM international conference on Multimedia (Part 1), p.447-456, October 30-November 05, 1999, Orlando, Florida, United States
|
|
|
C. Fleiner , R. B. Garner , J. L. Hafner , K. K. Rao , D. R. Kenchammana-Hosekote , W. W. Wilcke , J. S. Glider, Reliability of modular mesh-connected intelligent storage brick systems, IBM Journal of Research and Development, v.50 n.2/3, p.199-208, March 2006
|
|
|
|
|
Yifeng Zhu , Hong Jiang , Xiao Qin , Dan Feng , David R. Swanson, Design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system, Proceedings of the international workshop on Storage network architecture and parallel I/Os, p.53-64, September 28-28, 2003, New Orleans, Louisiana
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Svend Frølund , Arif Merchant , Yasushi Saito , Susan Spence , Alistair Veitch, FAB: enterprise storage systems on a shoestring, Proceedings of the 9th conference on Hot Topics in Operating Systems, p.29-29, May 18-21, 2003, Lihue, Hawaii
|
|
|
|
|
Tushar D. Chandra , Robert Griesemer , Joshua Redstone, Paxos made live: an engineering perspective, Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, p.398-407, August 12-15, 2007, Portland, Oregon, USA
|
|
|
|
|
|
|
|
|
|
|
Sumeet Sobti , Nitin Garg , Fengzhou Zheng , Junwen Lai , Yilei Shao , Chi Zhang , Elisha Ziskind , Arvind Krishnamurthy , Randolph Y. Wang, Segank: A Distributed Mobile Storage System, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
|
|
|
|
|
|
|
John MacCormick , Nicholas Murphy , Venugopalan Ramasubramanian , Udi Wieder , Junfeng Yang , Lidong Zhou, Kinesis: A new approach to replica placement in distributed storage systems, ACM Transactions on Storage (TOS), v.4 n.4, p.1-28, January 2009
|
|
|
|
|
|
|
Fay Chang , Minwen Ji , Shun-Tak Leung , John MacCormick , Sharon Perl , Li Zhang, Myriad: Cost-effective Disaster Tolerance, Proceedings of the 1st USENIX Conference on File and Storage Technologies, January 28-30, 2002, Monterey, CA
|
|
|
|
|
|
|
|
Dut h T. Meyer , Gitika Aggarwal , Brendan Cully , Geoffrey Lefebvre , Mi hael J. Feeley , Norman C. Hut hinson , Andrew Warfield, Parallax: virtual disks for virtual machines, ACM SIGOPS Operating Systems Review, v.42 n.4, May 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrew Warfield , Russ Ross , Keir Fraser , Christian Limpach , Steven Hand, Parallax: managing storage for a million machines, Proceedings of the 10th conference on Hot Topics in Operating Systems, p.4-4, June 12-15, 2005, Santa Fe, NM
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
John D. Strunk , Garth R. Goodson , Michael L. Scheinholtz , Craig A. N. Soules , Gregory R. Ganger, Self-securing storage: protecting data in compromised system, Proceedings of the 4th conference on Symposium on Operating System Design & Implementation, p.12-12, October 22-25, 2000, San Diego, California
|
|
|
Douglas Thain , Sander Klous , Justin Wozniak , Paul Brenner , Aaron Striegel , Jesus Izaguirre, Separating Abstractions from Resources in a Tactical Storage System, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p.55, November 12-18, 2005
|
|
|
|
|
|
|
|
|
|
|
|
John Maccormick , Chandramohan A. Thekkath , Marcus Jager , Kristof Roomp , Lidong Zhou , Ryan Peterson, Niobe: A practical replication protocol, ACM Transactions on Storage (TOS), v.3 n.4, p.1-43, February 2008
|
|
|
|
|
|
|
|
|
Cezary Dubnicki , Leszek Gryz , Lukasz Heldt , Michal Kaczmarczyk , Wojciech Kilian , Przemyslaw Strzelczak , Jerzy Szczepkowski , Cristian Ungureanu , Michal Welnicki, HYDRAstor: a Scalable Secondary Storage, Proccedings of the 7th conference on File and stroage technologies, p.197-210, February 24-27, 2009, San Francisco, California
|
|
|
Kei Hiraki , Mary Inaba , Junji Tamatsukuri , Ryutaro Kurusu , Yukichi Ikuta , Hisashi Koga , Akira Zinzaki, Data Reservoir: utilization of multi-gigabit backbone network for data-intensive research, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-9, November 16, 2002, Baltimore, Maryland
|
|
|
Matt DeBergalis , Peter Corbett , Steve Kleiman , Arthur Lent , Dave Noveck , Tom Talpey , Mark Wittle, The Direct Access File System, Proceedings of the 2nd USENIX Conference on File and Storage Technologies, March 31-31, 2003, San Francisco, CA
|
|
|
|
|
|
Chad Yoshikawa , Brent Chun , Paul Eastham , Amin Vahdat , Thomas Anderson , David Culler, Using smart clients to build scalable services, Proceedings of the Annual Technical Conference on Proceedings of the USENIX 1997 Annual Technical Conference, p.8-8, January 06-10, 1997, Anaheim, California
|
|
|
|
|
|
|
Ittai Abraham , Gregory V. Chockler , Idit Keidar , Dahlia Malkhi, Byzantine disk paxos: optimal resilience with byzantine shared memory, Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, July 25-28, 2004, St. John's, Newfoundland, Canada
|
|
|
Kimberley Keeton , Cipriano Santos , Dirk Beyer , Jeffrey Chase , John Wilkes, Designing for Disasters, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
|
|
John MacCormick , Nick Murphy , Marc Najork , Chandramohan A. Thekkath , Lidong Zhou, Boxwood: abstractions as the foundation for storage infrastructure, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p.8-8, December 06-08, 2004, San Francisco, CA
|
|
|
|
|
|
|
Yasushi Saito , Brian N. Bershad , Henry M. Levy, Manageability, availability, and performance in porcupine: a highly scalable, cluster-based mail service, ACM Transactions on Computer Systems (TOCS), v.18 n.3, p.298, Aug. 2000
|
|
|
|
|
|
|
Michael Abd-El-Malek , William V. Courtright, II , Chuck Cranor , Gregory R. Ganger , James Hendricks , Andrew J. Klosterman , Michael Mesnier , Manish Prasad , Brandon Salmon , Raja R. Sambasivan , Shafeeq Sinnamohideen , John D. Strunk , Eno Thereska , Matthew Wachs , Jay J. Wylie, Ursa minor: versatile cluster-based storage, Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, p.5-5, December 13-16, 2005, San Francisco, CA
|
|
|
Kostas Magoutis , Salimah Addetia , Alexandra Fedorova , Margo I. Seltzer , Jeffrey S. Chase , Andrew J. Gallatin , Richard Kisley , Rajiv Wickremesinghe , Eran Gabber, Structure and Performance of the Direct Access File System, Proceedings of the General Track: 2002 USENIX Annual Technical Conference, p.1-14, June 10-15, 2002
|
|
|
|
|
|
Hakim Weatherspoon , Lakshmi Ganesh , Tudor Marian , Mahesh Balakrishnan , Ken Birman, Smoke and mirrors: reflecting files at a geographically remote location without loss of performance, Proccedings of the 7th conference on File and stroage technologies, p.211-224, February 24-27, 2009, San Francisco, California
|
|
Geoffrey M. Voelker , Eric J. Anderson , Tracy Kimbrel , Michael J. Feeley , Jeffrey S. Chase , Anna R. Karlin , Henry M. Levy, Implementing cooperative prefetching and caching in a globally-managed memory system, ACM SIGMETRICS Performance Evaluation Review, v.26 n.1, p.33-43, June 1998
|
|
|
|
Atul Adya , William J. Bolosky , Miguel Castro , Gerald Cermak , Ronnie Chaiken , John R. Douceur , Jon Howell , Jacob R. Lorch , Marvin Theimer , Roger P. Wattenhofer, Farsite: federated, available, and reliable storage for an incompletely trusted environment, ACM SIGOPS Operating Systems Review, v.36 n.SI, Winter 2002
|
|
Atul Adya , William J. Bolosky , Miguel Castro , Gerald Cermak , Ronnie Chaiken , John R. Douceur , Jon Howell , Jacob R. Lorch , Marvin Theimer , Roger P. Wattenhofer, Farsite: federated, available, and reliable storage for an incompletely trusted environment, Proceedings of the 5th symposium on Operating systems design and implementation Due to copyright restrictions we are not able to make the PDFs for this conference available for downloading, December 09-11, 2002, Boston, Massachusetts
|
|
|
|
|
Zheng Zhang , Qiao Lian , Shiding Lin , Wei Chen , Yu Chen , Chao Jin, BitVault: a highly reliable distributed data retention platform, ACM SIGOPS Operating Systems Review, v.41 n.2, p.27-36, April 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
W. W. Wilcke , R. B. Garner , C. Fleiner , R. F. Freitas , R. A. Golding , J. S. Glider , D. R. Kenchammana-Hosekote , J. L. Hafner , K. M. Mohiuddin , K. K. Rao , R. A. Becker-Szendy , T. M. Wong , O. A. Zaki , M. Hernandez , K. R. Fernandez , H. Huels , H. Lenk , K. Smolin , M. Ries , C. Goettert , T. Picunko , B. J. Rubin , H. Kahn , T. Loo, IBM intelligent Bricks project: petabytes and beyond, IBM Journal of Research and Development, v.50 n.2/3, p.181-197, March 2006
|
|
|
|
|
|
Andrew Whitaker , Richard S. Cox , Marianne Shaw , Steven D. Grible, Constructing services with interposable virtual hardware, Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, p.13-13, March 29-31, 2004, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sudharshan S. Vazhkudai , Xiaosong Ma , Vincent W. Freeh , Jonathan W. Strickland , Nandan Tammineedi , Stephen L. Scott, FreeLoader: Scavenging Desktop Storage Resources for Scientific Data, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p.56, November 12-18, 2005
|
|
|
|
|
|
|
|
|
Hong Tang , Aziz Gulbeden , Jingyu Zhou , William Strathearn , Tao Yang , Lingkun Chu, A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications, Proceedings of the 2004 ACM/IEEE conference on Supercomputing, p.52, November 06-12, 2004
|
|
|
|
Garth A. Gibson , David F. Nagle , Khalil Amiri , Jeff Butler , Fay W. Chang , Howard Gobioff , Charles Hardin , Erik Riedel , David Rochberg , Jim Zelenka, A cost-effective, high-bandwidth storage architecture, ACM SIGOPS Operating Systems Review, v.32 n.5, p.92-103, Dec. 1998
|
|
|
|
|
|
|
|
|
|
Sudharshan S. Vazhkudai , Xiaosong Ma , Vincent W. Freeh , Jonathan W. Strickland , Nandan Tammineedi , Tyler Simon , Stephen L. Scott, Constructing collaborative desktop storage caches for large scientific datasets, ACM Transactions on Storage (TOS), v.2 n.3, p.221-254, August 2006
|
|
|
|
|
|