|
ABSTRACT
In this paper, we propose a decentralized group membership service that can be incorporated into existing grid middleware to make it more reliable. This service includes a flexible failure detector that adapts dynamically to changing network conditions and can be configured with a number of failure recovery strategies. Moreover, it disseminates information about membership changes (new processes, failures, etc.) in a scalable and efficient manner. We conducted a preliminary evaluation of the proposed service by simulating a grid with up to 140 nodes distributed across three domains separated by a wide-area network. This evaluation showed that the proposed service performs well both in the absence and in the presence of process failures.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
Alan Demers , Dan Greene , Carl Hauser , Wes Irish , John Larson , Scott Shenker , Howard Sturgis , Dan Swinehart , Doug Terry, Epidemic algorithms for replicated database maintenance, Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, p.1-12, August 10-12, 1987, Vancouver, British Columbia, Canada
[doi> 10.1145/41840.41841]
|
 |
6
|
|
| |
7
|
|
| |
8
|
I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputing Applications, 2(11):115--128, 1997.
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
S. Hwang and C. Kesselman. A flexible framework for fault tolerance in the grid. Journal of Grid Computing, 1(3):251--272, September 2003.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
R. Maia, R. Cerqueira, and R. Cosme. OiL: An object request broker in the Lua language. In Proc. 24th Brazilian Symposium on Computer Networks, 2006.
|
 |
19
|
|
| |
20
|
Xuanhua Shi , Hai Jin , Zongfen Han , Weizhong Qiang , Song Wu , Deqing Zou, ALTER: Adaptive Failure Detection Services for Grids, Proceedings of the 2005 IEEE International Conference on Services Computing, p.355-358, July 11-15, 2005
[doi> 10.1109/SCC.2005.23]
|
| |
21
|
|
| |
22
|
Tata Consulting Services. Wanem - the wide area network simulator. Last visit: August 2008 -- http://wanem.sourceforge.net/.
|
| |
23
|
R. van Renesse, Y. Minsky, and M. Hayden. A gossip-style failure detection service. In Proceedings of Middleware'1998, September 1998.
|
|