ACM Home Page
Please provide us with feedback. Feedback
Combining batch execution and leasing using virtual machines
Full text PdfPdf (1.62 MB)
Source
High Performance Distributed Computing archive
Proceedings of the 17th international symposium on High performance distributed computing table of contents
Boston, MA, USA
SESSION: Reservations, leasing, and scheduling table of contents
Pages 87-96  
Year of Publication: 2008
ISBN:978-1-59593-997-5
Authors
Borja Sotomayor  University of Chicago, Chicago, IL, USA
Kate Keahey  Argonne National Laboratory, Argonne, IL, USA
Ian Foster  Argonne National Laboratory, Argonne, IL, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 182,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1383422.1383434
What is a DOI?

ABSTRACT

As cluster computers are used for a wider range of applications, we encounter the need to deliver resources at particular times, to meet particular deadlines, and/or at the same time as other resources are provided elsewhere. To address such requirements, we describe a scheduling approach in which users request resource leases, where leases can request either as-soon-as-possible ("best-effort") or reservation start times. We present the design of a lease management architecture, Haizea, that implements leases as virtual machines (VMs), leveraging their ability to suspend, migrate, and resume computations and to provide leased resources with customized application environments. We discuss methods to minimize the overhead introduced by having to deploy VM images before the start of a lease. We also present the results of simulation studies that compare alternative approaches. Using workloads with various mixes of best-effort and advance reservation requests, we compare the performance of our VM-based approach with that of non-VM-based schedulers. We find that a VM-based approach can provide better performance (measured in terms of both total execution time and average delay incurred by best-effort requests) than a scheduler that does not support task pre-emption, and only slightly worse performance than a scheduler that does support task pre-emption. We also compare the impact of different VM image popularity distributions and VM image caching strategies on performance. These results emphasize the importance of VM image caching for the workloads studied and quantify the sensitivity of scheduling performance to VM image popularity distribution.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
A. Andrieux, K. Czajkowski, A. Dan, K. Keahey, H. Ludwig, T. Nakata, J. Pruyne, J. Rofrano, S. Tuecke, and M. Xu. Web services agreement specification (WS-Agreement).
 
3
 
4
W. S. Cleveland. Lowess: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 35(54), 1981.
 
5
 
6
W. Emeneker and D. Stanzione. Increasing Reliability through Dynamic Virtual Clustering. In High Availabilityand Performance Computing Workshop, 2006.
 
7
W. Emeneker and D. Stanzione. Efficient Virtual Machine Caching in Dynamic Virtual Clusters. In SRMPDS Workshop, ICAPDS 2007 Conference, December 2007.
 
8
 
9
 
10
 
11
I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, and A. Roy. A distributed resource management architecture that supports advance reservations and co-allocation. In Proceedings of the International Workshop on Quality of Service, 1999.
 
12
 
13
T. Freeman, K. Keahey, I. T. Foster, A. Rana, B. Sotomayor, and F. Wuerthwein. Division of labor: Tools for growing and scaling grids. In ICSOC, 2006.
 
14
 
15
P. H. Hargrove and J. C. Duell. Berkeley lab checkpoint/restart (blcr) for linux clusters. Journal of Physics: Conference Series, 46:494--499, 2006.
16
 
17
 
18
 
19
 
20
 
21
 
22
M. W. Margo, K. Yoshimoto, P. Kovatch, and P. Andrews. Impact of reservations on production job scheduling. In 13th Workshop on Job Scheduling Strategies for Parallel Processing, 2007.
 
23
 
24
 
25
P. Beckman, S.Nadella, N.Trebon, and I.Beschastnikh. SPRUCE: A system for supporting urgent high-performance computing. IFIP International Federation for Information Processing, Grid-Based Problem Solving Environments, 239:295--311, 2007.
 
26
 
27
P. Ruth, P. McGachey, and D. Xu. VioCluster: Virtualization for dynamic computational domains. Proceedings of the IEEE International Conference on Cluster Computing (Cluster'05), 2005.
 
28
P. Ruth, J. Rhee, D. Xu, R. Kennell, and S. Goasguen. Autonomic live adaptation of virtual computational environments in a multi-domain infrastructure. IEEE International Conference on Autonomic Computing, 2006., 2006.
 
29
G. Singh, C. Kesselman, and E. Deelman. Performance impact of resource provisioning on workflows. Technical Report 05-850, Department of Computer Science, University of South California, 2005.
 
30
 
31
 
32
B. Sotomayor. A resource management model for VM-based virtual workspaces. Master's thesis, University of Chicago, February 2007.
 
33
 
34
E. Walker, J. Gardner, V. Litvin, and E. Turner. Creating personal adaptive clusters for managing scientific tasks in a distributed computing environment. In Challenges of Large Applications in Distributed Environments, 2006.
 
35
S. Yamasaki, N. Maruyama, and S. Matsuoka. Model-based resource selection for efficient virtual cluster deployment. In VTDC '07: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2007.
 
36
H. Zhao and R. Sakellariou. Advance reservation policies for workflows. In 12th Workshop on Job Scheduling Strategies for Parallel Processing, 2006.
 
37
Amazon EC2. http://aws.amazon.com/ec2/.
 
38
Final report. teragrid co-scheduling/metascheduling requirements analysis team. http://www.teragridforum.org/mediawiki/images/b/b4/MetaschedRatReport.pdf.
 
39
Parallel workloads archive. http://www.cs.huji.ac.il/labs/parallel/workload/.


Collaborative Colleagues:
Borja Sotomayor: colleagues
Kate Keahey: colleagues
Ian Foster: colleagues