|
ABSTRACT
We present a method to enhance wormhole routing algorithms for deadlock-free fault-tolerant routing in tori. We consider arbitrarily-located faulty blocks and assume only local knowledge of faults. Messages are routed via shortest paths when there are no faults, and this constraint is only slightly relaxed to facilitate routing in the presence of faults. The key concept we use is that, for each fault region, a fault ring consisting of fault free nodes and physical channels can be formed around it. These fault rings can be used to route messages around fault regions. We prove that at most four additional virtual channels are sufficient to make any fully-adaptive algorithm tolerant to multiple faulty blocks in torus networks. As an example of this technique, we present simulation results for a fully-adaptive algorithm and show that good performance can be obtained with as many as 10% links faulty.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. AgarwM, et. al. The MIT A~wife machine: A largescMe di~fibuted multiprocessor. In Proc. of Workshop on Scalable Shared Memory Mult,processors. Kluwer Academic Publishers, 1991.
|
 |
2
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
3
|
K. Bolding and L. Snyder. Overv~w of fault handling for the chaos router, in Proceedings o} the 1991 IEEE Internatzonal Workshop on De~ect and Fault Tolerance in VLSI Systems, pages 124-127, 1991.
|
 |
4
|
|
| |
5
|
R. V. Boppana and S. Chalasani. Faul~tolerant wormhole routing MgoMthms for mesh networks. Submitted for publication, Dec. 1993.
|
| |
6
|
S. Borkar , R. Cohn , G. Cox , S. Gleason , T. Gross, Warp: an integrated solution of high-speed parallel computing, Proceedings of the 1988 ACM/IEEE conference on Supercomputing, p.330-339, November 12-17, 1988, Orlando, Florida, United States
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
S. A. Felperin, L. Gravano, G. D. PKarr6, and 3. L. Sanz. Routing techniques for mas~vely parall~ communication. Procee&ngs o/ the IEEE, 79(4):488-503, 1991.
|
| |
13
|
P. T. Gaughan and S. Yalamanchili. Pipelined circuitswitching: A fault-to,rant variant of wormhole routing. In Proc. Fourth IEEE Syrup. on Parallel and Distmbuted Processing, pages 148-155, 1992.
|
| |
14
|
C. ~. Glass and L. M. Ni. Fault-to,rant wormhole routing in meshes. In Twenty-Third Annual Int. Symp. on Fault-Tolerant Computing, pages 240-249, 1993.
|
| |
15
|
I. S. Gopal. Preven~on of store-and-forward deadlock in computer networks. IEEE Trans. on Communications, COM-33(12):1258-1264, Dec. 1985.
|
| |
16
|
P. Kermani and L. Kleinrock. VirtuM CuUThrough: A New Computer Communication Switching Technique. Computer Networks, 3:267-286, 1979.
|
| |
17
|
J. H. Kim and A. A. Chien. An evaluation of plana~ adaptive routing (PAR). In Proc. Fourth IEEE Syrup. on Parallel and Distributed Processing, 1992.
|
| |
18
|
S. S. Lam and M. Reiser. Congestion control of storeand-forward networks by input buffer 5mits--an analys~. IEEE Trans. on Communications, com-27(1):127- 133, Jan. 1979.
|
| |
19
|
|
| |
20
|
S. L. Lillevik. The Touchstone 30 Gigaflop DELTA prototype. In S~xth Distributed Memory Computing Conference, pages 671-677, 1991.
|
| |
21
|
|
 |
22
|
Michael D. Noakes , Deborah A. Wallach , William J. Dally, The J-machine multicomputer: an architectural evaluation, Proceedings of the 20th annual international symposium on Computer architecture, p.224-235, May 16-19, 1993, San Diego, California, United States
|
| |
23
|
A. L. Nara~mha Reddy and R. Frdtas. Fault torrance of adaptive routing algorithms in multicomputers. In Proc. Fourth IEEE Symp. on Parallel and D~stributed Processing, pages 156-161, 1992.
|
 |
24
|
|
| |
25
|
W. Oed. The cray research mas~vdy parallel processor system, CRAY T3D. Technical report, Cray Research Inc., Nov. 1993.
|
| |
26
|
C. S. Raghavendra, P.-J. Yang, and S.-B. Tien. Free d~ men~ons- an effective approach to achieving fault to~ erance in hypercubes. In Twenty-Second Annual Int. Symp. on Faul~Tolerant Computing, pages 170-177, 1992.
|
| |
27
|
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Huaxi Gu , Jie Zhang , Kun Wang , Zengji Liu , Guochang Kang, Enhanced fault tolerant routing algorithms using a concept of "balanced ring", Journal of Systems Architecture: the EUROMICRO Journal, v.53 n.12, p.902-912, December, 2007
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
F.
Theory of Computation
F.2
ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
F.2.2
Nonnumerical Algorithms and Problems
Subjects:
Routing and layout
Additional Classification:
C.
Computer Systems Organization
C.1
PROCESSOR ARCHITECTURES
General Terms:
Algorithms,
Measurement,
Performance,
Theory,
Verification
Keywords:
adaptive routing,
deadlocks,
fault-tolerant routing,
message routing,
multicomputer networks,
performance evaluation,
torus networks,
wormhole routing
|