| Transparent checkpoints of closed distributed systems in Emulab |
| Full text |
Pdf
(606 KB)
|
Source
|
European Conference on Computer Systems
archive
Proceedings of the 4th ACM European conference on Computer systems
table of contents
Nuremberg, Germany
SESSION: Real, running systems
table of contents
Pages 173-186
Year of Publication: 2009
ISBN:978-1-60558-482-9
|
|
Authors
|
|
Anton Burtsev
|
University of Utah, Salt Lake City, UT, USA
|
|
Prashanth Radhakrishnan
|
NetApp, Bangalore, India
|
|
Mike Hibler
|
University of Utah, Salt Lake City, UT, USA
|
|
Jay Lepreau
|
University of Utah, Salt Lake City, UT, USA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 110, Citation Count: 0
|
|
|
ABSTRACT
Emulab is a testbed for networked and distributed systems experimentation. Two guiding principles of its design are realism and control of experimentation. There is an inherent tension between these goals, however, and in some aspects of the testbed's design, Emulab's implementers favored realism over control. Thus, Emulab provides wide-ranging control over an experiment's environment and initial conditions, but relatively little control over its execution--in particular, the ability to suspend, preempt, or replay the experiment. We have extended Emulab with a new means of control over experiment execution: the ability to cleanly checkpoint the execution of the set of nodes and networks that comprise an experiment. Conventional checkpoint mechanisms can easily degrade the fidelity of experiment results as a consequence of checkpoint downtimes, overheads of background state saving, and unintended distributed checkpoint synchronization effects. In this paper we demonstrate a checkpointing technique that is transparent with respect to the execution of the system under test, almost completely concealing the underlying checkpoint activity. Building on our checkpoint mechanism, we have implemented two powerful facilities for experiment execution control: the ability to preemptively swap-out experiments without losing their run-time state, and the ability to time-travel through the run of a system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Paul Barham , Boris Dragovic , Keir Fraser , Steven Hand , Tim Harris , Alex Ho , Rolf Neugebauer , Ian Pratt , Andrew Warfield, Xen and the art of virtualization, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
 |
2
|
|
| |
3
|
|
| |
4
|
Christopher Clark , Keir Fraser , Steven Hand , Jacob Gorm Hansen , Eric Jul , Christian Limpach , Ian Pratt , Andrew Warfield, Live migration of virtual machines, Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation, p.273-286, May 02-04, 2005
|
| |
5
|
Russell Coker. Bonnie++, 2003. http://sourceforge.net/projects/bonnie/.
|
| |
6
|
Brendan Cully , Geoffrey Lefebvre , Dutch Meyer , Mike Feeley , Norm Hutchinson , Andrew Warfield, Remus: high availability via asynchronous virtual machine replication, Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, p.161-174, April 16-18, 2008, San Francisco, California
|
 |
7
|
George W. Dunlap , Samuel T. King , Sukru Cinar , Murtaza A. Basrai , Peter M. Chen, ReVirt: enabling intrusion analysis through virtual-machine logging and replay, Proceedings of the 5th symposium on Operating systems design and implementation Due to copyright restrictions we are not able to make the PDFs for this conference available for downloading, December 09-11, 2002, Boston, Massachusetts
[doi> 10.1145/1060289.1060309]
|
 |
8
|
George W. Dunlap , Dominic G. Lucchetti , Michael A. Fetterman , Peter M. Chen, Execution replay of multiprocessor virtual machines, Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, March 05-07, 2008, Seattle, WA, USA
[doi> 10.1145/1346256.1346273]
|
 |
9
|
|
| |
10
|
Dennis Geels et al. Friday: Global comprehension for distributed replay. In Proc. NSDI, pages 285--298, Cambridge, MA, April 2007.
|
| |
11
|
Diwaker Gupta , Kenneth Yocum , Marvin McNett , Alex C. Snoeren , Amin Vahdat , Geoffrey M. Voelker, To infinity and beyond: time-warped network emulation, Proceedings of the 3rd conference on Networked Systems Design & Implementation, p.7-7, May 08-10, 2006, San Jose, CA
|
| |
12
|
|
| |
13
|
Mike Hibler, Leigh Stoller, Jay Lepreau, Robert Ricci, and Chad Barb. Fast, scalable disk imaging with Frisbee. In Proc. USENIX, pages 283--296, San Antonio, TX, June 2003.
|
| |
14
|
IEEE. IEEE 1558 standard for a precision clock synchronization protocol for networked measurement and control systems, September 2004.
|
| |
15
|
Charles Killian et al. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proc. NSDI, pages 243--256, Cambridge, MA, April 2007.
|
| |
16
|
|
 |
17
|
H. Andres Lagar-Cavilla , Niraj Tolia , M. Satyanarayanan , Eyal de Lara, VMM-independent graphics acceleration, Proceedings of the 3rd international conference on Virtual execution environments, June 13-15, 2007, San Diego, California, USA
[doi> 10.1145/1254810.1254816]
|
 |
18
|
Dutch T. Meyer , Gitika Aggarwal , Brendan Cully , Geoffrey Lefebvre , Michael J. Feeley , Norman C. Hutchinson , Andrew Warfield, Parallax: virtual disks for virtual machines, Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008, April 01-04, 2008, Glasgow, Scotland UK
|
 |
19
|
|
| |
20
|
David L. Mills. Internet time synchronization: The network time protocol. IEEE Trans. Comm., 39:1482--1493, 1991.
|
 |
21
|
|
 |
22
|
|
| |
23
|
Prashanth Radhakrishnan. Stateful-swapping in the Emulab network testbed. Master's thesis, University of Utah, August 2008.
|
| |
24
|
Redhat. LVM2 Resource Page, 2006. http://sourceware.org/lvm2/.
|
 |
25
|
|
| |
26
|
Robert Ricci et al. The Flexlab approach to realistic evaluation of networked systems. In Proc. NSDI, pages 201--214, Cambridge, MA, April 2007.
|
 |
27
|
|
| |
28
|
Jose Renato Santos , Yoshio Turner , G. Janakiraman , Ian Pratt, Bridging the gap between software and hardware techniques for I/O virtualization, USENIX 2008 Annual Technical Conference on Annual Technical Conference, p.29-42, June 22-27, 2008, Boston, Massachusetts
|
| |
29
|
Sudarshan M. Srinivasan , Srikanth Kandula , Christopher R. Andrews , Yuanyuan Zhou, Flashback: a lightweight extension for rollback and deterministic replay for software debugging, Proceedings of the annual conference on USENIX Annual Technical Conference, p.3-3, June 27-July 02, 2004, Boston, MA
|
| |
30
|
Sun Microsystems, Inc. ZFS, June 2008. http://www.opensolaris.org/os/community/zfs/.
|
| |
31
|
Michael M. Swift , Muthukaruppan Annamalai , Brian N. Bershad , Henry M. Levy, Recovering device drivers, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p.1-1, December 06-08, 2004, San Francisco, CA
|
 |
32
|
Joseph Tucek , Shan Lu , Chengdu Huang , Spiros Xanthos , Yuanyuan Zhou, Triage: diagnosing production run failures at the user's site, Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, October 14-17, 2007, Stevenson, Washington, USA
|
 |
33
|
|
 |
34
|
Brian White , Jay Lepreau , Leigh Stoller , Robert Ricci , Shashi Guruprasad , Mac Newbold , Mike Hibler , Chad Barb , Abhijeet Joglekar, An integrated experimental environment for distributed systems and networks, Proceedings of the 5th symposium on Operating systems design and implementation Due to copyright restrictions we are not able to make the PDFs for this conference available for downloading, December 09-11, 2002, Boston, Massachusetts
[doi> 10.1145/1060289.1060313]
|
 |
35
|
|
 |
36
|
|
|