|
ABSTRACT
Checkpoint-Restart is one of the most used software approaches to achieve fault-tolerance in high-end clusters. While standard techniques typically focus on user-level solutions, the advent of virtualization software has enabled efficient and transparent system-level approaches. In this paper, we present a scalable transparent system-level solution to address fault-tolerance for applications based on global address space (GAS) programming models on Infiniband clusters. In addition to handling communication, the solution addresses transparent checkpoint of user-generated files. We exploit the support for the Infiniband network in the Xen virtual machine environment. We have developed a version of the Aggregate Remote Memory Copy Interface (ARMCI) one-sided communication library capable of suspending and resuming applications. We present efficient and scalable mechanisms to distribute checkpoint requests and to backup virtual machines memory images and file systems. We tested our approach in the context of NWChem, a popular computational chemistry suite. We demonstrated that NWChem can be executed, without any modification to the source code, on a virtualized 8-node cluster with very little overhead (below 3%). We observe that the total checkpoint time is limited by disk I/O. Finally, we measured system-size depended components of the checkpoint time on up to 1024 cores (128 nodes), demonstrating the scalability of our approach in medium/large-scale systems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Abramson, J. Jackson, S. Muthrasanallur,G. Neiger, G. Regnier, R. Sankaran, I. Schoinas,R. Uhlig, B. Vembu, and J. Wiegert. Intel Technology Journal (Intel) 10 (3): 179--192.
|
| |
2
|
|
 |
3
|
|
| |
4
|
S. Chakravorty and L. Kale. A Fault Tolerance Protocol for Fast Fault Recovery. In IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, March 2007.
|
| |
5
|
|
| |
6
|
|
 |
7
|
Paul Barham , Boris Dragovic , Keir Fraser , Steven Hand , Tim Harris , Alex Ho , Rolf Neugebauer , Ian Pratt , Andrew Warfield, Xen and the art of virtualization, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
| |
8
|
K. Fraser, S. Hand, R. Neugebauer, I. Pratt, A. Warfield, and M. Williamson. Safe hardware access with the Xen virtual machine monitor. In In Proceedings of the OASIS ASPLOS Workshop, 2004, 2004.
|
| |
9
|
|
| |
10
|
A. Geist and C. Engelmann. Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors. Oak Ridge National Laboratory, 2002.
|
| |
11
|
|
| |
12
|
W. Huang, Q. Gao, J. Liu, and D. Panda. High performance virtual machine migration with rdma over modern interconnects. Cluster Computing, 2007 IEEE International Conference on, pages 11--20, Sept. 2007.
|
 |
13
|
Wei Huang , Jiuxing Liu , Matthew Koop , Bulent Abali , Dhabaleswar Panda, Nomad: migrating OS-bypass networks in virtual machines, Proceedings of the 3rd international conference on Virtual execution environments, June 13-15, 2007, San Diego, California, USA
[doi> 10.1145/1254810.1254833]
|
| |
14
|
A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori. kvm: the linux virtual machine monitor. In Ottawa Linux Symposium, pages 225--230, July 2007.
|
| |
15
|
N.-B. C. Laboratory. MPI over InfiniBand Project.
|
| |
16
|
C. J. Li and W. K. Fuch. CATCH - Compiler Assisted Techniques for Checkpointing. In Proceedings of the International Symposium on Fault Tolerant Computing, pages 74--81, June 1990.
|
| |
17
|
Jiuxing Liu , Wei Huang , Bulent Abali , Dhabaleswar K. Panda, High performance VMM-bypass I/O in virtual machines, Proceedings of the annual conference on USENIX '06 Annual Technical Conference, p.3-3, May 30-June 03, 2006, Boston, MA
|
| |
18
|
J. Liu, J. Wu, S. P. Kini, D. Buntinas, W. Yu, B. Chandrasekaran, R. M. Noronha, P. Wyckoff, and D. K. Panda. MPI over infiniband: Early experiences.Technical Report OSU-CISRC-10/02-TR25, Ohio State Univ., Aug 2003.
|
| |
19
|
|
 |
20
|
Aravind Menon , Jose Renato Santos , Yoshio Turner , G. (John) Janakiraman , Willy Zwaenepoel, Diagnosing performance overheads in the xen virtual machine environment, Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments, June 11-12, 2005, Chicago, IL, USA
[doi> 10.1145/1064979.1064984]
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
K. Parzyszek, J. Nieplocha, and R. A. Kendall. Generalized Portable SHMEM Library for High Performance Computing. In M. Guizani and X. Shen, editors, IASTED Parallel and Distributed Computing and Systems, pages 401--406, 2000.
|
 |
25
|
|
| |
26
|
J. Ruscio, M. Heffner, and S. Varadarajan. DejaVu: Transparent User-Level Checkpointing, Migration and Recovery for Distributed Systems. In Proc. of the Intl. Parallel and Distributed Processing Symposium (IPDPS 2007), March 2007.
|
| |
27
|
|
| |
28
|
C. Wang, F. Mueller, C. Engelmann, and S. Scott. A Job Pause Service under LAM/MPI + BLCR for Transparent Fault Tolerance. In IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2007), March 2007.
|
| |
29
|
|
|