| Evaluation of a workflow scheduler using integrated performance modelling and batch queue wait time prediction |
| Full text |
Html
(2 KB),
Pdf
(313 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
table of contents
Tampa, Florida
SESSION: Technical papers
table of contents
Article No. 119
Year of Publication: 2006
ISBN:0-7695-2700-0
|
|
Authors
|
|
Daniel Nurmi
|
University of California, Santa Barbara, Santa Barbara, California
|
|
Anirban Mandal
|
Rice University, Houston, Texas
|
|
John Brevik
|
University of California, Santa Barbara, Santa Barbara, California
|
|
Chuck Koelbel
|
Rice University, Houston, Texas
|
|
Rich Wolski
|
University of California, Santa Barbara, Santa Barbara, California
|
|
Ken Kennedy
|
Rice University, Houston, Texas
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 66, Citation Count: 6
|
|
|
ABSTRACT
Large-scale distributed systems offer computational power at unprecedented levels. In the past, HPC users typically had access to relatively few individual supercomputers and, in general, would assign a one-to-one mapping of applications to machines. Modern HPC users have simultaneous access to a large number of individual machines and are beginning to make use of all of them for single-application execution cycles. One method that application developers have devised in order to take advantage of such systems is to organize an entire application execution cycle as a workflow. The scheduling of such workflows has been the topic of a great deal of research in the past few years and, although very sophisticated algorithms have been devised, a very specific aspect of these distributed systems, namely that most supercomputing resources employ batch queue scheduling software, has heretofore been omitted from consideration, presumably because it is difficult to model accurately. In this work, we augment an existing workflow scheduler through the introduction of methods which make accurate predictions of both the performance of the application on specific hardware, and the amount of time individual workflow tasks will spend waiting in batch queues. Our results show that although a workflow scheduler alone may choose correct task placement based on data locality or network connectivity, this benefit is often compromised by the fact that most jobs submitted to current systems must wait in overcommited batch queues for a significant portion of time. However, incorporating the enhancements we describe improves workflow execution time in settings where batch queues impose significant delays on constituent workflow tasks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Blythe , S. Jain , E. Deelman , Y. Gil , K. Vahi , A. Mandal , K. Kennedy, Task scheduling strategies for workflow-based applications in grids, Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2, p.759-767, May 09-12, 2005
|
 |
2
|
|
| |
3
|
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Patil, S., Su, M.-H., Vahi, K., And Livny, M. 2004. Pegasus: Mapping scientific workflows onto the grid. In Proceedings of the 2nd European Across Grids Conference.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Kerbyson, D. J., Alme, H. J., Hoisie, A., Petrini, F., Wasserman, H. J., And Gittings, M. 2001. Predictive performance and scalability modeling of a large-scale application.
|
 |
8
|
|
| |
9
|
Lifka, D., Henderson, M., And Rayl, K. 1995. Users guide to the argonne SP scheduling system. Tech. Rep. TM-201, Argonne National Laboratory, Mathematics and Computer Science Division, May.
|
| |
10
|
Ludtke, S., Baldwin, P., And Chiu, W. 1999. Eman: Semiautomated software for high-resolution single-particle reconstructions. J. Struct. Biol. 128, 82--97.
|
| |
11
|
Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., and Johnsson, L. 2005. Scheduling Strategies for Mapping Application Workflows onto the Grid. In 14-th IEEE Symposium on High Performance Distributed Computing (HPDC14), 125--134.
|
 |
12
|
|
| |
13
|
NSF TeraGrid Project. http://www.teragrid.org/.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
Tracy D. Braun , Howard Jay Siegel , Noah Beck , Lasislau L. Bölöni , Muthucumara Maheswaran , Albert I. Reuther , James P. Robertson , Mitchell D. Theys , Bin Yao , Debra Hensgen , Richard F. Freund, A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems, Journal of Parallel and Distributed Computing, v.61 n.6, p.810-837, June 2001
[doi> 10.1006/jpdc.2000.1714]
|
| |
22
|
|
| |
23
|
|
|