| A lightweight idempotent messaging protocol for faulty networks |
| Full text |
Pdf
(283 KB)
|
| Source
|
ACM Symposium on Parallel Algorithms and Architectures
archive
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
table of contents
Winnipeg, Manitoba, Canada
SESSION: Session 9
table of contents
Pages: 248 - 257
Year of Publication: 2002
ISBN:1-58113-529-7
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 22, Citation Count: 1
|
|
|
ABSTRACT
As parallel machines scale to one million nodes and beyond, it becomes increasingly difficult to build a reliable network that is able to guarantee packet delivery. Eventually large systems will need to employ fault-tolerant messaging protocols that afford correct execution in the presence of a lossy network. In this paper we present a lightweight protocol that preserves message idempotence and is easy to implement in hardware. We identify the requirements for a correct implementation of the protocol. Experiments are performed in simulation to determine implementation parameters that optimize performance. We find that an aggressive implementation on a fat tree network results in a slowdown of less than 2x compared to buffered wormhole routing on a fault-free network.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Baydal, P. Lopez, J. Duato, "A congestion control mechanism for wormhole networks", Proc. Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001, pp. 19--26.
|
 |
2
|
|
| |
3
|
|
 |
4
|
A. DeHon , F. Chong , M. Becker , E. Egozy , H. Minsky , S. Peretz , T. F. Knight, Jr., METRO: a router architecture for high-performance, short-haul routing networks, Proceedings of the 21ST annual international symposium on Computer architecture, p.266-277, April 18-21, 1994, Chicago, Illinois, United States
|
| |
5
|
Cray Research, "CRAY T3D System Architecture Overview", Cray Research Inc., March 1993. 169 pp.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Marco Fillo , Stephen W. Keckler , William J. Dally , Nicholas P. Carter , Andrew Chang , Yevgeny Gurevich , Whay S. Lee, The M-Machine multicomputer, Proceedings of the 28th annual international symposium on Microarchitecture, p.146-156, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
15
|
M. Galles, "Scalable Pipelined Interconnect for Distributed Endpoint Routing: The SPIDER Chip", Proc. Hot Interconnects Symposium IV, August 1996, pp. 141--146.
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
F. Allen , G. Almasi , W. Andreoni , D. Beece , B. J. Berne , A. Bright , J. Brunheroto , C. Cascaval , J. Castanos , P. Coteus , P. Crumley , A. Curioni , M. Denneau , W. Donath , M. Eleftheriou , B. Fitch , B. Fleischer , C. J. Georgiou , R. Germain , M. Giampapa , D. Gresh , M. Gupta , R. Haring , H. Ho , P. Hochschild , S. Hummel , T. Jonas , D. Lieber , G. Martyna , K. Maturu , J. Moreira , D. Newns , M. Newton , R. Philhower , T. Picunko , J. Pitera , M. Pitman , R. Rand , A. Royyuru , V. Salapura , A. Sanomiya , R. Shah , Y. Sham , S. Singh , M. Snir , F. Suits , R. Swetz , W. C. Swope , N. Vishnumurthy , T. J. C. Ward , H. Warren , R. Zhou, Blue Gene: a vision for protein science using a petaflop supercomputer, IBM Systems Journal, v.40 n.2, p.310-327, February 2001
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
 |
23
|
Michael D. Noakes , Deborah A. Wallach , William J. Dally, The J-machine multicomputer: an architectural evaluation, Proceedings of the 20th annual international symposium on Computer architecture, p.224-235, May 16-19, 1993, San Diego, California, United States
|
| |
24
|
J. Postel, "Transmission Control Protocol", RFC 793, 1981.
|
 |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
Jeremy Brown, "An Idemptent Message Protocol", Project Aries Technical Memo ARIES-TM-014, available at http://www.ai.mit.edu/projects/aries/documents
|
| |
29
|
Bobby Woods-Corwin, "A High Speed Fault-Tolerant Interconnect Fabric for Large-Scale Multiprocessing", M.Eng Thesis, Dept. of EECS, M.I.T., May 2001.
|
|