|
ABSTRACT
Javelin 3 is a software system for developing large-scale, fault tolerant, adaptively parallel applications. When all or part of their application can be cast as a master-worker or branch-and-bound computation, Javelin 3 frees application developers from concerns about inter-processor communication and fault tolerance among networked hosts, allowing them to focus on the underlying application. The paper describes a fault tolerant task scheduler and its performance analysis. The task scheduler integrates work stealing with an advanced form of eager scheduling. It enables dynamic task decomposition, which improves host load-balancing in the presence of tasks whose non-uniform computational load is evident only at execution time. Speedup measurements are presented of actual performance on up to 1,000 hosts. We analyze the expected performance degradation due to unresponsive hosts, and measure actual performance degradation due to unresponsive hosts.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Alexandrov, M. Ibel, K. E. Schauser, and C. Scheiman. SuperWeb: Research Issues in Java-Based Global Computing. Concurrency: Practice and Experience, 9(6):535--553, June 1997.
|
 |
2
|
|
| |
3
|
A. Baratloo, M. Karaul, Z. Kedem, and P. Wyckoff. Charlotte: Metacomputing on the Web. In Proceedings of the 9th Conference on Parallel and Distributed Computing Systems, 1996.
|
 |
4
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
 |
5
|
|
| |
6
|
Henri Casanova , Graziano Obertelli , Francine Berman , Rich Wolski, The AppLeS parameter sweep template: user-level middleware for the grid, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.60-es, November 04-10, 2000, Dallas, Texas, United States
|
| |
7
|
B. O. Christiansen, P. Cappello, M. F. Ionescu, M. O. Neary, K. E. Schauser, and D. Wu. Javelin: Internet-Based Parallel Computing Using Java. Concurrency: Practice and Experience, 9(11):1139--1160, Nov. 1997.
|
| |
8
|
|
| |
9
|
|
| |
10
|
I. Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications, 1997.
|
| |
11
|
G. Fox and W. Furmanski. Java for Parallel Computing and as a General Language for Scientific and Engineering Simulation and Modeling. Concurrency: Practice and Experience, 9(6):415--425, June 1997.
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
Ken Kennedy , Mark Mazina , John M. Mellor-Crummey , Keith D. Cooper , Linda Torczon , Francine Berman , Andrew A. Chien , Holly Dail , Otto Sievert , Dave Angulo , Ian T. Foster , Ruth A. Aydt , Daniel A. Reed , Dennis Gannon , S. Lennart Johnsson , Carl Kesselman , Jack Dongarra , Sathish S. Vadhiyar , Richard Wolski, Toward a Framework for Preparing and Executing Adaptive Grid Programs, Proceedings of the 16th International Parallel and Distributed Processing Symposium, p.322, April 15-19, 2002
|
| |
16
|
M. O. Neary, S. P. Brydon, P. Kmiec, S. Rollins, and P. Cappello. Javelin++: Scalability Issues in Global Computing. Concurrency: Practice and Experience, pages 727--753, Dec. 2000.
|
| |
17
|
|
| |
18
|
|
| |
19
|
M. Nibhanupudi and B. Szymanski. BSP-based Adaptive Parallel Processing. In R. Buyya, editor, High Performance Cluster Computing, pages 702--721. Prentice-Hall, 1999.
|
| |
20
|
|
 |
21
|
Rob van Nieuwpoort , Jason Maassen , Henri E. Bal , Thilo Kielmann , Ronald Veldema, Wire-area parallel computing in Java, Proceedings of the ACM 1999 conference on Java Grande, p.8-14, June 12-14, 1999, San Francisco, California, United States
[doi> 10.1145/304065.304087]
|
 |
22
|
Gregor von Laszewski , Ian Foster , Jarek Gawor, CoG kits: a bridge between commodity distributed computing and high-performance grids, Proceedings of the ACM 2000 conference on Java Grande, p.97-106, June 03-04, 2000, San Francisco, California, United States
[doi> 10.1145/337449.337491]
|
 |
23
|
Matt Welsh , David Culler , Eric Brewer, SEDA: an architecture for well-conditioned, scalable internet services, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
 |
24
|
Rich Wolski , John Brevik , Chandra Krintz , Graziano Obertelli , Neil Spring , Alan Su, Running EveryWare on the computational grid, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.6-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331538]
|
CITED BY 3
|
|
|
|
|
|
|
|
EunJoung Byun , SungJin Choi , MaengSoon Baik , JoonMin Gil , ChanYeol Park , ChongSun Hwang, MJSA: Markov job scheduler based on availability in desktop grid computing environment, Future Generation Computer Systems, v.23 n.4, p.616-622, May, 2007
|
|