skip to main content
research-article

A Deterministic-Path Routing Algorithm for Tolerating Many Faults on Very-Large-Scale Network-on-Chip

Published: 27 October 2020 Publication History

Abstract

Very-large-scale network-on-chip (VLS-NoC) has become a promising fabric for supercomputers, but this fabric may encounter the many-fault problem. This article proposes a deterministic routing algorithm to tolerate the effects of many faults in VLS-NoCs. This approach generates routing tables offline using a breadth-first traversal algorithm and stores a routing table locally in each switch for online packet transmission. The approach applies the Tarjan algorithm to degrade the faulty NoC and maximizes the number of available nodes in the reconfigured NoC. In 2D NoCs, the approach updates routing tables of some nodes using the deprecated channel/node rules and avoids deadlocks in the NoC. In 3D NoCs, the approach uses a forbidden-turn selection algorithm and detour rules to prevent faceted rings and ensures the NoC is deadlock-free. Experimental results demonstrate that the proposed approach provides fault-free communications of 2D and 3D NoCs after injecting 40 faulty links. Meanwhile, it maximizes the number of available nodes in the reconfigured NoC. The approach also outperforms existing algorithms in terms of average latency, throughput, and energy consumption.

References

[1]
Itir Akgun, Dylan Stow, and Yuan Xie. 2019. Network-on-chip design guidelines for monolithic 3-D integration. IEEE Micro 39, 6 (Nov. 2019), 46--53.
[2]
Razieh Alizadeh, Mohsen Saneei, and Masoumeh Ebrahimi. 2014. Fault-tolerant circular routing algorithm for 3D-NoC. In Proceedings of the 2014 International Congress on Technology, Communication, and Knowledge (ICTCK’14). IEEE, Los Alamitos, CA.
[3]
Vincenzo Catania, Andrea Mineo, Salvatore Monteleone, Maurizio Palesi, and Davide Patti. 2015. Noxim: An open, extensible and cycle-accurate network on chip simulator. In Proceedings of the 2015 IEEE 26th International Conference on Application-Specific Systems, Architectures, and Processors (ASAP’15). IEEE, Los Alamitos, CA.
[4]
Kyungwook Chang, Abhishek Koneru, Krishnendu Chakrabarty, and Sung Kyu Lim. 2017. Design automation and testing of monolithic 3D ICs: Opportunities, challenges, and solutions (invited paper). In Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’17). IEEE, Los Alamitos, CA.
[5]
Song Chen, Mengke Ge, Zhigang Li, Jinglei Huang, Qi Xu, and Feng Wu. 2020. Generalized fault-tolerance topology generation for application-specific network-on-chips. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 6 (June 2020), 1191--1204.
[6]
Yu-Yin Chen, En-Jui Chang, Hsien-Kai Hsin, Kun-Chih Chen, and An-Yeu Andy Wu. 2017. Path-diversity-aware fault-tolerant routing algorithm for network-on-chip systems. IEEE Transactions on Parallel and Distributed Systems 28, 3 (March 2017), 838--849.
[7]
Zhongsheng Chen, Ying Zhang, Zebo Peng, and Jianhui Jiang. 2019. A deterministic-path routing algorithm for tolerating many faults on wafer-level NoC. In Proceedings of the 2019 Design, Automation, and Test in Europe Conference and Exhibition (DATE’19). IEEE, Los Alamitos, CA.
[8]
Alexandre Coelho, Amir Charif, Nacer-Eddine Zergainoh, and Raoul Velazco. 2019. FL-RuNS: A high-performance and runtime reconfigurable fault-tolerant routing scheme for partially connected three-dimensional networks on chip. IEEE Transactions on Nanotechnology 18 (2019), 806--818.
[9]
Jack Dongarra. 2016. Report on the Sunway TaihuLight System. Technical Report. Oak Ridge National Laboratory.
[10]
Masoumeh Ebrahimi, Masoud Daneshtalab, and Juha Plosila. 2013. Fault-tolerant routing algorithm for 3D NoC using Hamiltonian path strategy. In Proceedings of the 2013 Design, Automation, and Test in Europe Conference and Exhibition (DATE’13). IEEE, Los Alamitos, CA.
[11]
Masoumeh Ebrahimi, Masoud Daneshtalab, Juha Plosila, and Hannu Tenhunen. 2012. MAFA: Adaptive fault-tolerant routing algorithm for networks-on-chip. In Proceedings of the 2012 15th Euromicro Conference on Digital System Design. IEEE, Los Alamitos, CA.
[12]
A. Mello, L. Moller, L. Ost, F. Moraes, and N. Calazans. 2004. Hermes NoC. Retrieved September 28, 2020 from http://toledo.inf.pucrs.br/∼grph/Projects/Hermes/Hermes.html.
[13]
Binzhang Fu, Yinhe Han, Huawei Li, and Xiaowei Li. 2014. ZoneDefense: A fault-tolerant routing for 2-D meshes without virtual channels. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 1 (Jan. 2014), 113--126.
[14]
C. J. Glass and L. M. Ni. 1992. The turn model for adaptive routing. In Proceedings the 19th Annual International Symposium on Computer Architecture. IEEE, Los Alamitos, CA.
[15]
C. R. Jesshope, P. R. Miller, and J. T. Yantchev. 1989. High performance communications in processor networks. In Proceedings of the 16th Annual International Symposium on Computer Architecture. IEEE, Los Alamitos, CA.
[16]
Dongjin Lee, Sourav Das, Janardhan Rao Doppa, Partha Pratim Pande, and Krishnendu Chakrabarty. 2018. Performance and thermal tradeoffs for energy-efficient monolithic 3D network-on-chip. ACM Transactions on Design Automation of Electronic Systems 23, 5 (Aug. 2018), 1--25.
[17]
Dongjin Lee, Sourav Das, Janardhan Rao Doppa, Partha Pratim Pande, and Krishnendu Chakrabarty. 2019. Impact of electrostatic coupling on monolithic 3D-enabled network on chip. ACM Transactions on Design Automation of Electronic Systems 24, 6 (Sept. 2019), 1--22.
[18]
Cheng Li, Mo Yang, and Paul Ampadu. 2018. An energy-efficient NoC router with adaptive fault-tolerance using channel slicing and on-demand TMR. IEEE Transactions on Emerging Topics in Computing 6, 4 (Oct. 2018), 538--550.
[19]
Shaoli Liu, Tianshi Chen, Ling Li, Xi Li, Mingzhe Zhang, Chao Wang, Haibo Meng, Xuehai Zhou, and Yunji Chen. 2015. FreeRider: Non-local adaptive network-on-chip routing with packet-carried propagation of congestion information. IEEE Transactions on Parallel and Distributed Systems 26, 8 (Aug. 2015), 2272--2285.
[20]
Weichen Liu, Lei Yang, Weiwen Jiang, Liang Feng, Nan Guan, Wei Zhang, and Nikil Dutt. 2018. Thermal-aware task mapping on dynamically reconfigurable network-on-chip based multiprocessor system-on-chip. IEEE Transactions on Computing 67, 12 (Dec. 2018), 1818--1834.
[21]
Zhonghai Lu and Yuan Yao. 2017. Dynamic traffic regulation in NoC-based systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 2 (Feb. 2017), 556--569.
[22]
Somayeh Maabi, Farshad Safaei, Amin Rezaei, Masoud Daneshtalab, and Dan Zhao. 2016. ERFAN: Efficient reconfigurable fault-tolerant deflection routing algorithm for 3-D network-on-chip. In Proceedings of the 2016 29th IEEE International System-on-Chip Conference (SOCC’16). IEEE, Los Alamitos, CA.
[23]
Shouvik Musavvir, Anwesha Chatterjee, Ryan Gary Kim, Dae Hyun Kim, and Partha Pratim Pande. 2020. Inter-tier process-variation-aware monolithic 3-D NoC design space exploration. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 3 (March 2020), 686--699.
[24]
Wei Tan, Huaxi Gu, Yintang Yang, Kun Wang, and Xiaolu Wang. 2017. Venus: A low-latency, low-loss 3-D hybrid network-on-chip for kilocore systems. Journal of Lightwave Technology 35, 24 (Dec. 2017), 5448--5455.
[25]
Robert Tarjan. 1971. Depth-first search and linear graph algorithms. In Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat’71). IEEE, Los Alamitos, CA.
[26]
Eduardo W. Wachter, Vinicius Fochi, Francisco Barreto, Alexandre M. Amory, and Fernando G. Moraes. 2018. A hierarchical and distributed fault tolerant proposal for NoC-based MPSoCs. IEEE Transactions on Emerging Topics in Computing 6, 4 (Oct. 2018), 524--537.
[27]
Liang Wang, Xiaohang Wang, Ho-Fung Leung, and Terrence Mak. 2019. A non-minimal routing algorithm for aging mitigation in 2D-mesh NoCs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 7 (July 2019), 1373--1377.
[28]
Feihao Wu, Juan Chen, Yong Dong, Wenxu Zheng, Xiaodong Pan, Yuan Yuan, Zhixin Ou, and Yuyang Sun. 2019. A holistic energy-efficient approach for a processor-memory system. Tsinghua Science and Technology 24, 4 (Aug. 2019), 468--483.
[29]
Dong Xiang, Krishnendu Chakrabarty, and Hideo Fujiwara. 2016. Multicast-based testing and thermal-aware test scheduling for 3D ICs with a stacked network-on-chip. IEEE Transactions on Computers 65, 9 (Sept. 2016), 2767--2779.
[30]
Ying Zhang, Krishnendu Chakrabarty, Huawei Li, and Jianhui Jiang. 2017. Software-based online self-testing of network-on-chip using bounded model checking. In Proceedings of the 2017 IEEE International Test Conference (ITC’17). 1--10.
[31]
Hongzhi Zhao, Nader Bagherzadeh, and Jie Wu. 2017. A general fault-tolerant minimal routing for mesh architectures. IEEE Transactions on Computers 66, 7 (July 2017), 1240--1246.
[32]
Jishen Zhao, Qiaosha Zou, and Yuan Xie. 2017. Overview of 3-D architecture design opportunities and techniques. IEEE Design 8 Test 34, 4 (Aug. 2017), 60--68.
[33]
Qiaosha Zou, Eren Kursun, and Yuan Xie. 2017. Thermomechanical stress-aware management for 3-D IC designs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 9 (Sept. 2017), 2678--2682.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 26, Issue 1
January 2021
234 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3422280
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 27 October 2020
Accepted: 01 July 2020
Revised: 01 July 2020
Received: 01 March 2020
Published in TODAES Volume 26, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D NoC
  2. Routing algorithm
  3. avoiding deadlock
  4. fault-tolerant NoC
  5. turn model

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • NSFC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)5
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media