|
ABSTRACT
Data de-duplication has become a commodity component in data-intensive systems and it is required that these systems provide high reliability comparable to others. Unfortunately, by storing duplicate data chunks just once, de-duped system improves storage utilization at cost of error resilience or reliability. In this paper, R-ADMAD, a high reliability provision mechanism is proposed. It packs variable-length data chunks into fixed sized objects, and exploits ECC codes to encode the objects and distributes them among the storage nodes in a redundancy group, which is dynamically generated according to current status and actual failure domains. Upon failures, R-ADMAD proposes a distributed and dynamic recovery process. Experimental results show that R-ADMAD can provide the same storage utilization as RAID-like schemes, but comparable reliability to replication based schemes with much more redundancy. The average recovery time of R-ADMAD based configurations is about 2-6 times less than RAID-like schemes. Moreover, R-ADMAD can provide dynamic load balancing even without the involvement of the overloaded storage nodes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J F Gantz, et al. The Expanding Digital Universe: A Forecast of Worldwide Information Growth through 2010. IDC, March 2007.
|
 |
2
|
|
| |
3
|
EMC Centera. Content Addressed Storage. http://www.emc.com/pdf/products/centera/centera guide.pdf.
|
| |
4
|
Data Domain. http://www.datadomain.com.
|
| |
5
|
Quantum Dxi-Series. http://www.quantum.com/Products/
|
| |
6
|
Symantec PureDisk. http://www.symantec.com/business/products/overview.jsp?pcid=2244&pvid=1381_1
|
| |
7
|
Chuanyi Liu , Yingping Lu , Chunhui Shi , Guanlin Lu , David H. C. Du , Dong-Sheng Wang, ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System, Proceedings of the 2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os, p.29-35, September 22-22, 2008
[doi> 10.1109/SNAPI.2008.11]
|
| |
8
|
Deepavali Bhagwat , Kristal Pollack , Darrell D. E. Long , Thomas Schwarz , Ethan L. Miller , Jehan-Francois Paris, Providing High Reliability in a Minimum Redundancy Archival Storage System, Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation, p.413-421, September 11-14, 2006
[doi> 10.1109/MASCOTS.2006.42]
|
| |
9
|
|
| |
10
|
Qin Xin. Understanding and Coping with Failures in Large-Scale Storage Systems. Technical Report UCSC-SSRC-07-06, May 2007.
|
| |
11
|
David Reine. Enterprise Data Center Storage Issues. THE CLIPPER GROUP Navigator, September 11, 2008. Accessed from http://www.clipper.com/research/TCG2008043.pdf
|
 |
12
|
|
| |
13
|
N Tolia, M Kozuch, and M Satyanarayanan, et al. Opportunistic Use of Content Addressable Storage for Distributed File Systems. In Proc. of Usenix 2003 Annual Technical Conference, San Antonio, TX, USA
|
 |
14
|
|
 |
15
|
|
| |
16
|
Lawrence L. You and Christos Karamanolis, Evaluation of Efficient Archival Storage Techniques. 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies. April 13--16, 2004, College Park, Maryland, USA
|
| |
17
|
|
| |
18
|
N. Spillers. Storage Challenges in the Medical Industry. In The 4th Intelligent Storage Workshop, Digital Technology Center, University of Minnesota, 2006.
|
| |
19
|
|
| |
20
|
B Van Rompay, On the security of dedicated hash functions. In the 19th Symposium on Information Theory in the Benelux, 1998
|
| |
21
|
|
| |
22
|
M. O. Rabin. Fingerprinting by random polynomials. Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University, 1981.
|
 |
23
|
Sage A. Weil , Scott A. Brandt , Ethan L. Miller , Carlos Maltzahn, CRUSH: controlled, scalable, decentralized placement of replicated data, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188582]
|
| |
24
|
W. W. Peterson and E. J. Weldon, Jr., Error-Correcting Codes, Second Edition. MIT Press, Cambridge, MA, 1972.
|
| |
25
|
F. J. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes, Part I. North-Holland, Amsterdam, 1977
|
| |
26
|
Luby, M. G., M. Mitzenmacher, M.A. Shokrollahi, and D. A. Spielman, ''Efficient Erasure Correcting Codes'', IEEE Transactions on Information Theory, 47(2), 569--584, February 2001.
|
| |
27
|
R. A. Meyer and R. Bagrodia. PARSEC user manual, release 1.1. http://pcl.cs.ucla.edu/projects/parsec/.
|
| |
28
|
|
| |
29
|
MySQL. http://www.mysql.com.
|
| |
30
|
David Du, Dingshan He, Changjin Hong, Jaehoon Jeong, Vishal Kher, Yongdae Kim, Yingping Lu, Aravindan Raghuveer, and Sarah Sharafkandi, ''Experiences in Building an Object-Based Storage System based on the OSD T-10 Standard,'' Submitted to 14th NASA Goddard & 23rd IEEE (MSST2006) Conference on Mass Storage Systems and Technologies May 15-18, 2006, College Park, MD
|
 |
31
|
|
 |
32
|
|
| |
33
|
Sage A. Weil , Scott A. Brandt , Ethan L. Miller , Darrell D. E. Long , Carlos Maltzahn, Ceph: a scalable, high-performance distributed file system, Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, p.22-22, November 06-08, 2006, Seattle, WA
|
| |
34
|
Lustre Object-based Cluster File System. http://www.sun.com/software/products/lustre/index.xml
|
| |
35
|
Storage Networking Solutions. Object Storage Architecture: Defining a new generation of storage systems built on distributed, intelligent storage devices. http://www.snseurope.com/featuresfull.php?id=2193. 2004, 9
|
| |
36
|
|
| |
37
|
|
 |
38
|
|
| |
39
|
|
| |
40
|
IBM Enterprise disk storage. http://www.ibm.com/systems/storage/disk/enterprise/ds_family.html
|
| |
41
|
NCBI GenBank. http://www.ncbi.nlm.nih.gov/Genbank/.
|
| |
42
|
J. G. Elerath. Specifying reliability in the disk drive industry: No more MTBF's. In Proceedings of the 2000 Annual Reliability and Maintainability, pages 194--199. IEEE, 2000.
|
| |
43
|
Bianca Schroeder , Garth A. Gibson, Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?, Proceedings of the 5th USENIX conference on File and Storage Technologies, p.1-es, February 13-16, 2007, San Jose, CA
|
| |
44
|
James S. Plank , Jianqiang Luo , Catherine D. Schuman , Lihao Xu , Zooko Wilcox-O'Hearn, A performance evaluation and examination of open-source erasure coding libraries for storage, Proccedings of the 7th conference on File and storage technologies, p.253-265, February 24-27, 2009, San Francisco, California
|
 |
45
|
Sung Hoon Baek , Bong Wan Kim , Eui Joung Joung , Chong Won Park, Reliability and performance of hierarchical RAID with multiple controllers, Proceedings of the twentieth annual ACM symposium on Principles of distributed computing, p.246-254, August 2001, Newport, Rhode Island, United States
[doi> 10.1145/383962.384036]
|
| |
46
|
|
|