skip to main content
10.1145/181181.181330acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article
Free access

Fault-tolerant wormhole routing in tori

Published: 16 July 1994 Publication History

Abstract

We present a method to enhance wormhole routing algorithms for deadlock-free fault-tolerant routing in tori. We consider arbitrarily-located faulty blocks and assume only local knowledge of faults. Messages are routed via shortest paths when there are no faults, and this constraint is only slightly relaxed to facilitate routing in the presence of faults. The key concept we use is that, for each fault region, a fault ring consisting of fault free nodes and physical channels can be formed around it. These fault rings can be used to route messages around fault regions. We prove that at most four additional virtual channels are sufficient to make any fully-adaptive algorithm tolerant to multiple faulty blocks in torus networks. As an example of this technique, we present simulation results for a fully-adaptive algorithm and show that good performance can be obtained with as many as 10% links faulty.

References

[1]
A. AgarwM, et. al. The MIT A~wife machine: A largescMe di~fibuted multiprocessor. In Proc. of Workshop on Scalable Shared Memory Mult,processors. Kluwer Academic Publishers, 1991.
[2]
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfi~d, and B. Smith. The Tera computer system. In Proc. 1990 Int. Conf. on Supereomputin9.
[3]
K. Bolding and L. Snyder. Overv~w of fault handling for the chaos router, in Proceedings o} the 1991 IEEE Internatzonal Workshop on De~ect and Fault Tolerance in VLSI Systems, pages 124-127, 1991.
[4]
R. V. Boppana and S. Chalasani. A comparison of adaptive wormhole routing algorithms. In Proc. 20th Ann. int. Syrnp. on Cornput. Arch., pages 351-360, May 1993.
[5]
R. V. Boppana and S. Chalasani. Faul~tolerant wormhole routing MgoMthms for mesh networks. Submitted for publication, Dec. 1993.
[6]
S. Borkar et al. iWarp: An integrated solution to highspeed paralld computing. In Proc. Supercomputing ~8, pages 330-339.
[7]
A. A. Chien and J. H. Kim. Plana~adaptive routing: Low-cost adaptive networks for mul~processors. In Proc. 19th Ann. Int. Symp. on Comput. Arch., pages 268-277, 1992.
[8]
W. J. Dally. Virtu~-chann~ flow control. IEEE Trans. on Parallel and Distributed Systems, 3(2):194- 205, Mar. 1992.
[9]
W. J. Dally and H. Aoki. Deadlock-flee adaptive routing in mult~omputer networks using virtuM chann~s. IEEE Trans. on Parallel and Distributed Systems, 4(4):466-475, April 1993.
[10]
W. J. DMly and C. L. Seitz. Deadlock-flee message routing in mul~processor interconnection networks. IEEE Trans. on Computers, C-36(5):547-553, 1987.
[11]
J. Duato. A new theory of deadlock-flee adaptive routing in wormhole networks. IEEE Trans. on Parallel and D~stributed Systems, 4(12):1320-1331, Dec. 1993.
[12]
S. A. Felperin, L. Gravano, G. D. PKarr6, and 3. L. Sanz. Routing techniques for mas~vely parall~ communication. Procee&ngs o/ the IEEE, 79(4):488-503, 1991.
[13]
P. T. Gaughan and S. Yalamanchili. Pipelined circuitswitching: A fault-to,rant variant of wormhole routing. In Proc. Fourth IEEE Syrup. on Parallel and Distmbuted Processing, pages 148-155, 1992.
[14]
C. ~. Glass and L. M. Ni. Fault-to,rant wormhole routing in meshes. In Twenty-Third Annual Int. Symp. on Fault-Tolerant Computing, pages 240-249, 1993.
[15]
I. S. Gopal. Preven~on of store-and-forward deadlock in computer networks. IEEE Trans. on Communications, COM-33(12):1258-1264, Dec. 1985.
[16]
P. Kermani and L. Kleinrock. VirtuM CuUThrough: A New Computer Communication Switching Technique. Computer Networks, 3:267-286, 1979.
[17]
J. H. Kim and A. A. Chien. An evaluation of plana~ adaptive routing (PAR). In Proc. Fourth IEEE Syrup. on Parallel and Distributed Processing, 1992.
[18]
S. S. Lam and M. Reiser. Congestion control of storeand-forward networks by input buffer 5mits--an analys~. IEEE Trans. on Communications, com-27(1):127- 133, Jan. 1979.
[19]
T. Lee and J. Hayes. A faul~tolerant communication scheme for hypercube computers. IEEE Trans. on Computers, 41(10):1242-1256, Oct. 1992.
[20]
S. L. Lillevik. The Touchstone 30 Gigaflop DELTA prototype. In S~xth Distributed Memory Computing Conference, pages 671-677, 1991.
[21]
D. H. Linder and J. C. Harden. An adaptive and fault tolerant wormhole routing strategy for k-ary n-cubes. IEEE Trans. on Computers, 40(1):2-12, 1991.
[22]
M.D. Noakes et al. The J-machine mul~computer: An architectural evaluation. In Proc. 20th Ann. Int. Syrup. on Comput. Arch., pages 224-235, May 1993.
[23]
A. L. Nara~mha Reddy and R. Frdtas. Fault torrance of adaptive routing algorithms in multicomputers. In Proc. Fourth IEEE Symp. on Parallel and D~stributed Processing, pages 156-161, 1992.
[24]
J. Y. Ngai and C. L. S~tz. A framework for adaptive routing in mul~computer networks. In Proc. Fsrst Syrup. on Parallel Algorzthms" and Architectures, 1989.
[25]
W. Oed. The cray research mas~vdy parallel processor system, CRAY T3D. Technical report, Cray Research Inc., Nov. 1993.
[26]
C. S. Raghavendra, P.-J. Yang, and S.-B. Tien. Free d~ men~ons- an effective approach to achieving fault to~ erance in hypercubes. In Twenty-Second Annual Int. Symp. on Faul~Tolerant Computing, pages 170-177, 1992.
[27]
C. S~tz. Concurrent architectures. In R. Suaya and G. B~twislte, editors, VLSI and Parallel Computation, chapter 1, pages 1-84. Morgan-K~ufman Pubh~hcr~, Inc., San Mateo, California, 1990.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '94: Proceedings of the 8th international conference on Supercomputing
July 1994
452 pages
ISBN:0897916654
DOI:10.1145/181181
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 July 1994

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive routing
  2. deadlocks
  3. fault-tolerant routing
  4. message routing
  5. multicomputer networks
  6. performance evaluation
  7. torus networks
  8. wormhole routing

Qualifiers

  • Article

Conference

ICS94
Sponsor:
ICS94: International Conference on Supercomputing '94
July 11 - 15, 1994
Manchester, England

Acceptance Rates

ICS '94 Paper Acceptance Rate 45 of 114 submissions, 39%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)6
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media