ACM Home Page
Please provide us with feedback. Feedback
A lightweight idempotent messaging protocol for faulty networks
Full text PdfPdf (283 KB)
Source ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures table of contents
Winnipeg, Manitoba, Canada
SESSION: Session 9 table of contents
Pages: 248 - 257  
Year of Publication: 2002
ISBN:1-58113-529-7
Authors
Jeremy Brown  M.I.T., Cambridge, MA
J. P. Grossman  M.I.T., Cambridge, MA
Tom Knight  M.I.T., Cambridge, MA
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 22,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564870.564912
What is a DOI?

ABSTRACT

As parallel machines scale to one million nodes and beyond, it becomes increasingly difficult to build a reliable network that is able to guarantee packet delivery. Eventually large systems will need to employ fault-tolerant messaging protocols that afford correct execution in the presence of a lossy network. In this paper we present a lightweight protocol that preserves message idempotence and is easy to implement in hardware. We identify the requirements for a correct implementation of the protocol. Experiments are performed in simulation to determine implementation parameters that optimize performance. We find that an aggressive implementation on a fat tree network results in a slowdown of less than 2x compared to buffered wormhole routing on a fault-free network.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Baydal, P. Lopez, J. Duato, "A congestion control mechanism for wormhole networks", Proc. Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001, pp. 19--26.
2
 
3
4
 
5
Cray Research, "CRAY T3D System Architecture Overview", Cray Research Inc., March 1993. 169 pp.
 
6
 
7
 
8
 
9
10
 
11
 
12
 
13
 
14
 
15
M. Galles, "Scalable Pipelined Interconnect for Distributed Endpoint Routing: The SPIDER Chip", Proc. Hot Interconnects Symposium IV, August 1996, pp. 141--146.
16
 
17
 
18
 
19
20
21
 
22
23
 
24
J. Postel, "Transmission Control Protocol", RFC 793, 1981.
25
 
26
 
27
 
28
Jeremy Brown, "An Idemptent Message Protocol", Project Aries Technical Memo ARIES-TM-014, available at http://www.ai.mit.edu/projects/aries/documents
 
29
Bobby Woods-Corwin, "A High Speed Fault-Tolerant Interconnect Fabric for Large-Scale Multiprocessing", M.Eng Thesis, Dept. of EECS, M.I.T., May 2001.


Collaborative Colleagues:
Jeremy Brown: colleagues
J. P. Grossman: colleagues
Tom Knight: colleagues