skip to main content
research-article

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability

Published: 19 March 2018 Publication History

Abstract

Emerging chips with hundreds and thousands of cores require networks with unprecedented energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on-chip network design that delivers significant improvements in efficiency and scalability compared to the state-of-the-art. The key idea is to use two concepts from graph and number theory, degree-diameter graphs combined with non-prime finite fields, to enable the smallest number of ports for a given core count. SN is inspired by state-of-the-art off-chip topologies; it identifies and distills their advantages for NoC settings while solving several key issues that lead to significant overheads on-chip. SN provides NoC-specific layouts, which further enhance area/energy efficiency. We show how to augment SN with state-of-the-art router microarchitecture schemes such as Elastic Links, to make the network even more scalable and efficient. Our extensive experimental evaluations show that SN outperforms both traditional low-radix topologies (e.g., meshes and tori) and modern high-radix networks (e.g., various Flattened Butterflies) in area, latency, throughput, and static/dynamic power consumption for both synthetic and real workloads. SN provides a promising direction in scalable and energy-efficient NoC topologies.

References

[1]
N. Abeyratne, R. Das, Q. Li, K. Sewell, B. Giridhar, R. G. Dreslinski, D. Blaauw, and T. Mudge. Scaling Towards Kilo-Core Processors with Asymmetric High-Radix Topologies. HPCA, 2013.
[2]
T. Agerwala, J. Martin, J. Mirza, D. Sadler, D. Dias, and M. Snir. SP2 System Architecture. IBM Systems Journal, 1995.
[3]
J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing. ISCA, 2015.
[4]
J. H. Ahn, N. Binkert, A. Davis, M. McLaren, and R. S. Schreiber. HyperX: Topology, Routing, and Packaging of Efficient Large-Scale Networks. SC, 2009.
[5]
J. H. Ahn, Y. H. Son, and J. Kim. Scalable High-Radix Router Microarchitecture Using a Network Switch Organization. ACM TACO, 2008.
[6]
R. Alverson, D. Roweth, and L. Kaplan. The Gemini System Interconnect. HOTI, 2010.
[7]
R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu. Design and Evaluation of Hierarchical Rings with Deflection Routing. SBAC-PAD, 2014.
[8]
R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu. A Case for Hierarchical Rings with Deflection Routing. PARCO, 2016.
[9]
J. Balfour and W. J. Dally. Design Tradeoffs for Tiled CMP On-Chip Networks. ICS, 2006.
[10]
M. Besta, S. M. Hassan, S. Yalamanchili, R. Ausavarungnirun, O. Mutlu, and T. Hoefler. Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy-Efficiency and Scalability. Technical report, 2017.
[11]
M. Besta and T. Hoefler. Slim Fly: A Cost Effective Low-Diameter Network Topology. SC, 2014.
[12]
Y. Cai, K. Mai, and O. Mutlu. Comparative Evaluation of FPGA and ASIC Implementations of Bufferless and Buffered Routing Algorithms for On-Chip Networks. ISQED, 2015.
[13]
A. Ceyhan, M. Jung, S. Panth, S. K. Lim, and A. Naeemi. Impact of Size Effects in Local Interconnects for Future Technology Nodes: A Study Based on Full-Chip Layouts. IITC/AMC, 2014.
[14]
K. K.-W. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu. HAT: Heterogeneous Adaptive Throttling for On-Chip Networks. SBAC-PAD, 2012.
[15]
C.-H. O. Chen, S. Park, T. Krishna, S. Subramanian, A. P. Chandrakasan, and L.-S. Peh. SMART: A Single-Cycle Reconfigurable NoC for SoC Applications. DATE, 2013.
[16]
L. Chen and T. M. Pinkston. Worm-bubble flow control. HPCA, 2013.
[17]
L. Chen, R. Wang, and T. Pinkston. Critical Bubble Scheme: An Efficient Implementation of Globally Aware Network Flow Control. IPDPS, 2011.
[18]
C. Craik and O. Mutlu. Investigating the Viability of Bufferless NoCs in Modern Chip Multi-Processor Systems. Carnegie Mellon University Safari Technical Report, 2011.
[19]
W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., 2003.
[20]
R. Das, S. Eachempati, A. Mishra, V. Narayanan, and C. Das. Design and Evaluation of a Hierarchical On-Chip Interconnect for Next-Generation CMPs. HPCA, 2009.
[21]
R. Das, O. Mutlu, T. Moscibroda, and C. Das. Application-Aware Prioritization Mechanisms for On-Chip Networks. MICRO, 2009.
[22]
R. Das, O. Mutlu, T. Moscibroda, and C. R. Das. Aérgia: Exploiting Packet Latency Slack in On-Chip Networks. In ISCA, 2010.
[23]
J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart. LINPACK Users' Guide. SIAM, 1979.
[24]
EZchip Semiconductor Ltd. EZchip Introduces TILE-Mx100 World's Highest Core-Count ARM Processor Optimized for High-Performance Networking Applications. http://www.tilera.com/News/PressRelease/?ezchip=97, 2015.
[25]
C. Fallin, C. Craik, and O. Mutlu. CHIPPER: A Low-Complexity Bufferless Deflection Router. HPCA, 2011.
[26]
C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu. MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect. NOCS, 2012.
[27]
C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu. Bufferless and Minimally-Buffered Deflection Routing. Routing Algorithms in Networks-on-Chip, 2014.
[28]
H. Fu, J. Liao, J. Yang, L. Wang, Z. Song, X. Huang, C. Yang, W. Xue, F. Liu, F. Qiao, et al. The Sunway TaihuLight Supercomputer: System and Applications. Science China Information Sciences, 2016.
[29]
B. Grot, J. Hestness, S. Keckler, and O. Mutlu. Express Cube Topologies for On-Chip Interconnects. HPCA, 2009.
[30]
B. Grot, J. Hestness, S. Keckler, and O. Mutlu. Kilo-NoC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees. ISCA, 2011.
[31]
S. Hassan and S. Yalamanchili. Centralized Buffer Router: A Low Latency, Low Power Router for High Radix NoCs. NOCS, 2013.
[32]
S. Hassan and S. Yalamanchili. Bubble Sharing: Area and Energy Efficient Adaptive Routers using Centralized Buffers. NOCS, 2014.
[33]
IBM ILOG. User's Manual for CPLEX, v12.1. International Business Machines Corporation, 2009.
[34]
A. Jain, R. Parikh, and V. Bertacco. High-Radix On-Chip Networks with Low-Radix Routers. ICCAD, 2014.
[35]
N. Jiang, G. Michelogiannakis, D. Becker, B. Towles, and W. J. Dally. Booksim 2.0 User's Guide. Standford University, 2010.
[36]
Y.-H. Kao, M. Yang, N. S. Artan, and H. J. Chao. CNoC: High-Radix Clos Network-on-Chip. TCAD, 2011.
[37]
J. Kim. Low-Cost Router Microarchitecture for On-Chip Networks. MICRO, 2009.
[38]
J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks. ISCA, 2007.
[39]
J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-Driven, Highly-Scalable Dragonfly Topology. ISCA, 2008.
[40]
A. K. Kodi, A. Sarathy, and A. Louri. iDEAL: Inter-Router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures. ISCA, 2008.
[41]
A. Kumar, L.-S. Peh, P. Kundu, and N. Jha. Toward Ideal On-Chip Communication Using Express Virtual Channels. IEEE Micro, 2008.
[42]
C. E. Leiserson. Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing. IEEE TC, 1985.
[43]
J. Liu and J. G. Delgado-Frias. A DAMQ Shared Buffer Scheme for Network-on-Chip. CSS, 2007.
[44]
R. Manevich, L. Polishuk, I. Cidon, and A. Kolodny. Designing Single-Cycle Long Links in Hierarchical NoCs. Microprocessors and Microsystems, 2014.
[45]
B. D. McKay, M. Miller, and J. vSirán. A Note on Large Graphs of Diameter Two and Given Maximum Degree. Journal of Combinatorial Theory, Series B, 1998.
[46]
G. Michelogiannakis, J. Balfour, and W. Dally. Elastic-Buffer Flow Control for On-Chip Networks. HPCA, 2009.
[47]
T. Moscibroda and O. Mutlu. A Case for Bufferless Routing in On-Chip Networks. ISCA, 2009.
[48]
C. Nicopoulos, D. Park, J. Kim, N. Vijaykrishnan, M. S. Yousif, and C. R. Das. ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers. MICRO, 2006.
[49]
G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. Next Generation On-Chip Networks: What Kind of Congestion Control Do We Need? In HotNets, 2010.
[50]
G. P. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan. On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-Core Interconnects. SIGCOMM, 2012.
[51]
A. Olofsson. Epiphany-V: A 1024 Processor 64-bit RISC System-on-Chip. arXiv preprint arXiv:1610.01832, 2016.
[52]
Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, and A. Choudhary. Firefly: Illuminating Future Network-on-Chip with Nanophotonics. ISCA, 2009.
[53]
L.-S. Peh and W. J. Dally. A Delay Model and Speculative Architecture for Pipelined Routers. HPCA, 2001.
[54]
Pezy Computing. PEZY-SC2. http://pezy.jp.
[55]
N. Pippenger and G. Lin. Fault-Tolerant Circuit-Switching Networks. SPAA, 1992.
[56]
V. Puente, R. Beivide, J. Gregorio, J. Prellezo, J. Duato, and C. Izu. Adaptive Bubble Router: A Design to Improve Performance in Torus Networks. ICPP, 1999.
[57]
R. Ramanujam, V. Soteriou, B. Lin, and L.-S. Peh. Design of a High-Throughput Distributed Shared-Buffer NoC Router. NOCS, 2010.
[58]
P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE CAL, 2011.
[59]
S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-Radix Clos Network. ISCA, 2006.
[60]
I. Seitanidis, A. Psarras, G. Dimitrakopoulos, and C. Nicopoulos. ElastiStore: An Elastic Buffer Architecture for Network-on-Chip Routers. DATE, 2014.
[61]
K. Sewell, R. G. Dreslinski, T. Manville, S. Satpathy, N. Pinckney, G. Blake, M. Cieslak, R. Das, T. F. Wenisch, D. Sylvester, D. Blaauw, and T. Mudge. Swizzle-Switch Networks for Many-Core Systems. Emerging and Selected Topics in Circuits and Systems, 2012.
[62]
A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.
[63]
S. Skiena. Dijkstra's algorithm. Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Addison-Wesley, 1990.
[64]
A. Sodani. Knights Landing (KNL): 2nd Generation Intel® Xeon Phi Processor. HCS, 2015.
[65]
G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press Wellesley, MA, 1993.
[66]
C. Sun, C. O. Chen, G. Kurian, L. Wei, J. E. Miller, A. Agarwal, L. Peh, and V. Stojanovic. DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling. NOCS, 2012.
[67]
Y. Tamir and G. Frazier. Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches. IEEE TC, 1992.
[68]
A. T. Tran and B. M. Baas. RoShaQ: High-Performance On-Chip Router with Shared Queues. ICCD, 2011.
[69]
A. N. Udipi, N. Muralimanohar, and R. Balasubramonian. Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks. HPCA, 2010.
[70]
J. Wang, J. Beu, R. Bheda, T. Conte, Z. Dong, C. Kersey, M. Rasquinha, G. Riley, W. Song, H. Xiao, P. Xu, and S. Yalamanchili. Manifold: A Parallel Simulation Framework for Multicore Systems. ISPASS, 2014.
[71]
R. Wang, L. Chen, and T. M. Pinkston. Bubble Coloring: Avoiding Routing- and Protocol-Induced Deadlocks with Minimal Virtual Channel Requirement. ICS, 2013.
[72]
X. Xiang, S. Ghose, O. Mutlu, and N.-F. Tzeng. A Model for Application Slowdown Estimation in On-Chip Networks and Its Use for Improving System Fairness and Performance. ICCD, 2016.
[73]
X. Xiang, W. Shi, S. Ghose, L. Peng, O. Mutlu, and N.-F. Tzeng. Carpool: A Bufferless On-Chip Network Supporting Adaptive Multicast and Hotspot Alleviation. ICS, 2017.
[74]
Y. Xu, Y. Du, B. Zhao, X. Zhou, Y. Zhang, and J. Yang. A Low-Radix and Low-Diameter 3D Interconnection Network Design. HPCA, 2009.
[75]
H. Yang, J. Tripathi, N. E. Jerger, and D. Gibson. Dodec: Random-Link, Low-Radix On-Chip Networks. MICRO, 2014.
[76]
X. Yuan. On Nonblocking Folded-Clos Networks in Computer Communication Environments. IPDPS, 2011.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 53, Issue 2
ASPLOS '18
February 2018
809 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3296957
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
    March 2018
    827 pages
    ISBN:9781450349116
    DOI:10.1145/3173162
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018
Published in SIGPLAN Volume 53, Issue 2

Check for updates

Author Tags

  1. energy efficiency
  2. many-core systems
  3. on-chip-networks
  4. parallel processing
  5. scalability

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)6
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media