ACM Home Page
Please provide us with feedback. Feedback
Fault-tolerant wormhole routing in tori
Full text PdfPdf (1.16 MB)
Source International Conference on Supercomputing archive
Proceedings of the 8th international conference on Supercomputing table of contents
Manchester, England
Pages: 146 - 155  
Year of Publication: 1994
ISBN:0-89791-665-4
Authors
Suresh Chalasani  Electrical & Comp. Engr. Dept., Univ. of Wisconsin-Madison, Madison, WI
Rajendra V. Boppana  Div. of Math. and Computer Science, The Univ. of Texas at San Antonio, San Antonio, TX
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 8,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/181181.181330
What is a DOI?

ABSTRACT

We present a method to enhance wormhole routing algorithms for deadlock-free fault-tolerant routing in tori. We consider arbitrarily-located faulty blocks and assume only local knowledge of faults. Messages are routed via shortest paths when there are no faults, and this constraint is only slightly relaxed to facilitate routing in the presence of faults. The key concept we use is that, for each fault region, a fault ring consisting of fault free nodes and physical channels can be formed around it. These fault rings can be used to route messages around fault regions. We prove that at most four additional virtual channels are sufficient to make any fully-adaptive algorithm tolerant to multiple faulty blocks in torus networks. As an example of this technique, we present simulation results for a fully-adaptive algorithm and show that good performance can be obtained with as many as 10% links faulty.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. AgarwM, et. al. The MIT A~wife machine: A largescMe di~fibuted multiprocessor. In Proc. of Workshop on Scalable Shared Memory Mult,processors. Kluwer Academic Publishers, 1991.
2
 
3
K. Bolding and L. Snyder. Overv~w of fault handling for the chaos router, in Proceedings o} the 1991 IEEE Internatzonal Workshop on De~ect and Fault Tolerance in VLSI Systems, pages 124-127, 1991.
4
 
5
R. V. Boppana and S. Chalasani. Faul~tolerant wormhole routing MgoMthms for mesh networks. Submitted for publication, Dec. 1993.
 
6
7
 
8
 
9
 
10
 
11
 
12
S. A. Felperin, L. Gravano, G. D. PKarr6, and 3. L. Sanz. Routing techniques for mas~vely parall~ communication. Procee&ngs o/ the IEEE, 79(4):488-503, 1991.
 
13
P. T. Gaughan and S. Yalamanchili. Pipelined circuitswitching: A fault-to,rant variant of wormhole routing. In Proc. Fourth IEEE Syrup. on Parallel and Distmbuted Processing, pages 148-155, 1992.
 
14
C. ~. Glass and L. M. Ni. Fault-to,rant wormhole routing in meshes. In Twenty-Third Annual Int. Symp. on Fault-Tolerant Computing, pages 240-249, 1993.
 
15
I. S. Gopal. Preven~on of store-and-forward deadlock in computer networks. IEEE Trans. on Communications, COM-33(12):1258-1264, Dec. 1985.
 
16
P. Kermani and L. Kleinrock. VirtuM CuUThrough: A New Computer Communication Switching Technique. Computer Networks, 3:267-286, 1979.
 
17
J. H. Kim and A. A. Chien. An evaluation of plana~ adaptive routing (PAR). In Proc. Fourth IEEE Syrup. on Parallel and Distributed Processing, 1992.
 
18
S. S. Lam and M. Reiser. Congestion control of storeand-forward networks by input buffer 5mits--an analys~. IEEE Trans. on Communications, com-27(1):127- 133, Jan. 1979.
 
19
 
20
S. L. Lillevik. The Touchstone 30 Gigaflop DELTA prototype. In S~xth Distributed Memory Computing Conference, pages 671-677, 1991.
 
21
22
 
23
A. L. Nara~mha Reddy and R. Frdtas. Fault torrance of adaptive routing algorithms in multicomputers. In Proc. Fourth IEEE Symp. on Parallel and D~stributed Processing, pages 156-161, 1992.
24
 
25
W. Oed. The cray research mas~vdy parallel processor system, CRAY T3D. Technical report, Cray Research Inc., Nov. 1993.
 
26
C. S. Raghavendra, P.-J. Yang, and S.-B. Tien. Free d~ men~ons- an effective approach to achieving fault to~ erance in hypercubes. In Twenty-Second Annual Int. Symp. on Faul~Tolerant Computing, pages 170-177, 1992.
 
27

CITED BY  11

Collaborative Colleagues:
Suresh Chalasani: colleagues
Rajendra V. Boppana: colleagues