ACM Home Page
Please provide us with feedback. Feedback
The effectiveness of deduplication on virtual machine disk images
Full text PdfPdf (388 KB)
Source ACM International Conference Proceeding Series archive
Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference table of contents
Haifa, Israel
SESSION: Deduplication table of contents
Article No. 7  
Year of Publication: 2009
ISBN:978-1-60558-623-6
Authors
Keren Jin  University of California, Santa Cruz
Ethan L. Miller  University of California, Santa Cruz
Sponsors
: Melanox Technologies
: Hebrew University of Jerusalem
IBM : IBM
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 36,   Downloads (12 Months): 110,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1534530.1534540
What is a DOI?

ABSTRACT

Virtualization is becoming widely deployed in servers to efficiently provide many logically separate execution environments while reducing the need for physical servers. While this approach saves physical CPU resources, it still consumes large amounts of storage because each virtual machine (VM) instance requires its own multi-gigabyte disk image. Moreover, existing systems do not support ad hoc block sharing between disk images, instead relying on techniques such as overlays to build multiple VMs from a single "base" image.

Instead, we propose the use of deduplication to both reduce the total storage required for VM disk images and increase the ability of VMs to share disk blocks. To test the effectiveness of deduplication, we conducted extensive evaluations on different sets of virtual machine disk images with different chunking strategies. Our experiments found that the amount of stored data grows very slowly after the first few virtual disk images if only the locale or software configuration is changed, with the rate of compression suffering when different versions of an operating system or different operating systems are included. We also show that fixed-length chunks work well, achieving nearly the same compression rate as variable-length chunks. Finally, we show that simply identifying zero-filled blocks, even in ready-to-use virtual machine disk images available online, can provide significant savings in storage.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Anonymous. Secure hash standard. FIPS 180--1, National Institute of Standards and Technology, Apr. 1995.
 
3
4
 
5
6
 
7
Collin, L. A quick benchmark: Gzip vs. Bzip2 vs. LZMA, 2005.
8
 
9
Daum, M., and Lucks, S. Hash Collisions (The Poisoned Message Attack) "The Story of Alice and her Boss". Presentation at Rump Sessions of Eurocrypt 2005 5 (2005).
 
10
Deutsch, P. Deflate compressed data format specification version 1.3.
 
11
 
12
Gupta, D., Lee, S., Vrable, M., Savage, S., Snoeren, A. C., Varghese, G., Voelker, G. M., and Vahdat, A. Difference Engine: Harnessing memory redundancy in virtual machines. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI) (Dec. 2008), pp. 309--322.
 
13
14
 
15
 
16
Liguori, A., and Van Hensbergen, E. Experiences with content addressable storage and virtual disks. In Proceedings of the First Workshop on I/O Virtualization (Dec. 2008).
17
 
18
 
19
 
20
 
21
Rabin, M. O. Fingerprinting by random polynomials. Tech. Rep. TR-15-81, Center for Research in Computing Technology, Harvard University, 1981.
22
 
23
 
24
 
25
VMware Inc. Virtual disk format. VMware web site, http://www.vmware.com/interfaces/vmdk.html, 11 2007.
 
26
Wang, X., Yin, Y. L., and Yu, H. Finding collisions in the full SHA-1. Lecture Notes in Computer Science 3621 (2005), 17--36.
 
27
 
28
 
29
 
30
Ziv, J., and Lempel, A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 3 (May 1977), 337--343.

Collaborative Colleagues:
Keren Jin: colleagues
Ethan L. Miller: colleagues