ACM Home Page
Please provide us with feedback. Feedback
Improving duplicate elimination in storage systems
Full text PdfPdf (482 KB)
Source ACM Transactions on Storage (TOS) archive
Volume 2 ,  Issue 4  (November 2006) table of contents
Pages: 424 - 448  
Year of Publication: 2006
ISSN:1553-3077
Authors
Deepak R. Bobbarjung  Purdue University, West Lafayette, IN
Suresh Jagannathan  Purdue University, West Lafayette, IN
Cezary Dubnicki  NEC Laboratories America, Princeton, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 154,   Citation Count: 0
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1210596.1210599
What is a DOI?

ABSTRACT

Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has already been stored not only reduces storage overheads, but can also improve bandwidth utilization. For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems.Intelligent object partitioning techniques identify data that is new when objects are updated, and transfer only these chunks to a storage server. In this article, we propose a new object partitioning technique, called fingerdiff, that improves upon existing schemes in several important respects. Most notably, fingerdiff dynamically chooses a partitioning strategy for a data object based on its similarities with previously stored objects in order to improve storage and bandwidth utilization. We present a detailed evaluation of fingerdiff, and other existing object partitioning schemes, using a set of real-world workloads. We show that for these workloads, the duplicate elimination strategies employed by fingerdiff improve storage utilization on average by 25%, and bandwidth utilization on average by 40% over comparable techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Berlekamp, E. R. 1968. Algebraic Coding Theory. McGraw-Hill, New York.
 
3
Blomer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., and Zuckerman, D. 1995. An xor-based erasure-resilient coding scheme. Tech. Rep., International Computer Science Institute, Berkeley, California.
 
4
 
5
 
6
 
7
Cederqvist, P. 1992. Version management with cvs. http://www.cvshome.org/docs/manual/.
8
 
9
Douglis, F. and Iyengar, A. 2003. Application-Specific deltaencoding via resemblance detection. In Usenix Annual Technical Conference. 59--72.
 
10
Douglis, P. K. F., LaVoie, J., and Tracey, J. M. 2004. Redundancy elimination within large collections of files. In Usenix Annual Technical Conference. 59--72.
 
11
 
12
Hong, B., Plantenberg, D., Long, D. D. E., and Sivan-Zimet, M. 2004. Duplicate data elimination in a san file system. In Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST). 301--314.
13
 
14
Jain, N., Dahlin, M., and Tewari, R. 2005. Taper: Tiered approach for eliminating redundancy in replica sychronization. In Proceedings of the 4th Usenix Conference on File and Storage Technologies (FAST).
15
16
17
 
18
Manber, U. 1994. Finding similar files in a large file system. In Usenix Winter Conference. 1--10.
19
 
20
National Institute of Standards and Technology FIPS PUB 180-1. 1995. Secure hash standard.
 
21
 
22
Policroniades, C. and Pratt, I. 2004. Alternatives for detecting redundancy in storage systems data. In Usenix Annual Technical Conference. 73--86.
 
23
 
24
Rabin, M. 1981. Fingerprinting by Random Polynomials. Tech. Rep. TR-15-81, Center for Research in Computing Technology, Harvard University.
 
25
Rochkind, M. J. 1975. The source code control system. IEEE Trans. Softw. Eng. 1, 4, 364--370.
 
26
Shivakumar, N. and García-Molina, H. 1995. SCAM: A copy detection mechanism for digital documents. In Proceedings of the 2nd Annual Conference on the Theory and Practice of Digital Libraries.
27
 
28
 
29
W. J. Bolosky, S. Corbin, D. G. and Douceur, J. R. Single instance storage in windows 2000. In Usenix Annual Technical Conference.
 
30
 
31
You, L. L. and Karamanolis, C. 2004. Evaluation of efficient archival storage techniques. In Proceedings of the 21st IEEE Symposium on Mass Storage Systems and Technologies (MSST).
 
32
Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23, 3, 337--343.


Collaborative Colleagues:
Deepak R. Bobbarjung: colleagues
Suresh Jagannathan: colleagues
Cezary Dubnicki: colleagues