|
ABSTRACT
Virtualization is becoming widely deployed in servers to efficiently provide many logically separate execution environments while reducing the need for physical servers. While this approach saves physical CPU resources, it still consumes large amounts of storage because each virtual machine (VM) instance requires its own multi-gigabyte disk image. Moreover, existing systems do not support ad hoc block sharing between disk images, instead relying on techniques such as overlays to build multiple VMs from a single "base" image. Instead, we propose the use of deduplication to both reduce the total storage required for VM disk images and increase the ability of VMs to share disk blocks. To test the effectiveness of deduplication, we conducted extensive evaluations on different sets of virtual machine disk images with different chunking strategies. Our experiments found that the amount of stored data grows very slowly after the first few virtual disk images if only the locale or software configuration is changed, with the rate of compression suffering when different versions of an operating system or different operating systems are included. We also show that fixed-length chunks work well, achieving nearly the same compression rate as variable-length chunks. Finally, we show that simply identifying zero-filled blocks, even in ready-to-use virtual machine disk images available online, can provide significant savings in storage.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Anonymous. Secure hash standard. FIPS 180--1, National Institute of Standards and Technology, Apr. 1995.
|
| |
3
|
|
 |
4
|
Paul Barham , Boris Dragovic , Keir Fraser , Steven Hand , Tim Harris , Alex Ho , Rolf Neugebauer , Ian Pratt , Andrew Warfield, Xen and the art of virtualization, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
| |
5
|
|
 |
6
|
|
| |
7
|
Collin, L. A quick benchmark: Gzip vs. Bzip2 vs. LZMA, 2005.
|
 |
8
|
|
| |
9
|
Daum, M., and Lucks, S. Hash Collisions (The Poisoned Message Attack) "The Story of Alice and her Boss". Presentation at Rump Sessions of Eurocrypt 2005 5 (2005).
|
| |
10
|
Deutsch, P. Deflate compressed data format specification version 1.3.
|
| |
11
|
|
| |
12
|
Gupta, D., Lee, S., Vrable, M., Savage, S., Snoeren, A. C., Varghese, G., Voelker, G. M., and Vahdat, A. Difference Engine: Harnessing memory redundancy in virtual machines. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI) (Dec. 2008), pp. 309--322.
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
Liguori, A., and Van Hensbergen, E. Experiences with content addressable storage and virtual disks. In Proceedings of the First Workshop on I/O Virtualization (Dec. 2008).
|
 |
17
|
|
| |
18
|
Partho Nath , Michael A. Kozuch , David R. O'Hallaron , Jan Harkes , M. Satyanarayanan , Niraj Tolia , Matt Toups, Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines, Proceedings of the annual conference on USENIX '06 Annual Technical Conference, p.6-6, May 30-June 03, 2006, Boston, MA
|
| |
19
|
|
| |
20
|
|
| |
21
|
Rabin, M. O. Fingerprinting by random polynomials. Tech. Rep. TR-15-81, Center for Research in Computing Technology, Harvard University, 1981.
|
 |
22
|
Mark W. Storer , Kevin Greenan , Darrell D.E. Long , Ethan L. Miller, Secure data deduplication, Proceedings of the 4th ACM international workshop on Storage security and survivability, October 31-31, 2008, Alexandria, Virginia, USA
[doi> 10.1145/1456469.1456471]
|
| |
23
|
|
| |
24
|
|
| |
25
|
VMware Inc. Virtual disk format. VMware web site, http://www.vmware.com/interfaces/vmdk.html, 11 2007.
|
| |
26
|
Wang, X., Yin, Y. L., and Yu, H. Finding collisions in the full SHA-1. Lecture Notes in Computer Science 3621 (2005), 17--36.
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
Ziv, J., and Lempel, A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 3 (May 1977), 337--343.
|
|