ACM Home Page
Please provide us with feedback. Feedback
Evaluation of a workflow scheduler using integrated performance modelling and batch queue wait time prediction
Full text HtmlHtml (2 KB),  PdfPdf (313 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2006 ACM/IEEE conference on Supercomputing table of contents
Tampa, Florida
SESSION: Technical papers table of contents
Article No. 119  
Year of Publication: 2006
ISBN:0-7695-2700-0
Authors
Daniel Nurmi  University of California, Santa Barbara, Santa Barbara, California
Anirban Mandal  Rice University, Houston, Texas
John Brevik  University of California, Santa Barbara, Santa Barbara, California
Chuck Koelbel  Rice University, Houston, Texas
Rich Wolski  University of California, Santa Barbara, Santa Barbara, California
Ken Kennedy  Rice University, Houston, Texas
Sponsors
IEEE : Institute of Electrical and Electronics Engineers
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 66,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1188455.1188579
What is a DOI?

ABSTRACT

Large-scale distributed systems offer computational power at unprecedented levels. In the past, HPC users typically had access to relatively few individual supercomputers and, in general, would assign a one-to-one mapping of applications to machines. Modern HPC users have simultaneous access to a large number of individual machines and are beginning to make use of all of them for single-application execution cycles. One method that application developers have devised in order to take advantage of such systems is to organize an entire application execution cycle as a workflow. The scheduling of such workflows has been the topic of a great deal of research in the past few years and, although very sophisticated algorithms have been devised, a very specific aspect of these distributed systems, namely that most supercomputing resources employ batch queue scheduling software, has heretofore been omitted from consideration, presumably because it is difficult to model accurately. In this work, we augment an existing workflow scheduler through the introduction of methods which make accurate predictions of both the performance of the application on specific hardware, and the amount of time individual workflow tasks will spend waiting in batch queues. Our results show that although a workflow scheduler alone may choose correct task placement based on data locality or network connectivity, this benefit is often compromised by the fact that most jobs submitted to current systems must wait in overcommited batch queues for a significant portion of time. However, incorporating the enhancements we describe improves workflow execution time in settings where batch queues impose significant delays on constituent workflow tasks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Patil, S., Su, M.-H., Vahi, K., And Livny, M. 2004. Pegasus: Mapping scientific workflows onto the grid. In Proceedings of the 2nd European Across Grids Conference.
 
4
 
5
 
6
 
7
Kerbyson, D. J., Alme, H. J., Hoisie, A., Petrini, F., Wasserman, H. J., And Gittings, M. 2001. Predictive performance and scalability modeling of a large-scale application.
8
 
9
Lifka, D., Henderson, M., And Rayl, K. 1995. Users guide to the argonne SP scheduling system. Tech. Rep. TM-201, Argonne National Laboratory, Mathematics and Computer Science Division, May.
 
10
Ludtke, S., Baldwin, P., And Chiu, W. 1999. Eman: Semiautomated software for high-resolution single-particle reconstructions. J. Struct. Biol. 128, 82--97.
 
11
Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., and Johnsson, L. 2005. Scheduling Strategies for Mapping Application Workflows onto the Grid. In 14-th IEEE Symposium on High Performance Distributed Computing (HPDC14), 125--134.
12
 
13
NSF TeraGrid Project. http://www.teragrid.org/.
 
14
 
15
 
16
 
17
18
19
 
20
 
21
 
22
 
23


Collaborative Colleagues:
Daniel Nurmi: colleagues
Anirban Mandal: colleagues
John Brevik: colleagues
Chuck Koelbel: colleagues
Rich Wolski: colleagues
Ken Kennedy: colleagues