research-article

Design tradeoffs for tiled CMP on-chip networks

Authors:

William J. DallyAuthors Info & Claims

ACM International Conference on Supercomputing 25th Anniversary Volume

Pages 390 - 401

https://doi.org/10.1145/2591635.2667187

Published: 28 June 2006 Publication History

Abstract

We develop detailed area and energy models for on-chip interconnection networks and describe tradeoffs in the design of efficient networks for tiled chip multiprocessors. Using these detailed models we investigate how aspects of the network architecture including topology, channel width, routing strategy, and buffer size affect performance and impact area and energy efficiency. We simulate the performance of a variety of on-chip networks designed for tiled chip multiprocessors implemented in an advanced VLSI process and compare area and energy efficiencies estimated from our models. We demonstrate that the introduction of a second parallel network can increase performance while improving efficiency, and evaluate different strategies for distributing traffic over the subnetworks. Drawing on insights from our analysis, we present a concentrated mesh topology with replicated subnetworks and express channels which provides a 24% improvement in area efficiency and a 48% improvement in energy efficiency over other networks evaluated in this study.

References

[1]

A. Adriahantenaina, H. Charlery, A. Greiner, L. Mortiez, and C. A. Zeferino. Spin: A scalable, packet switched, on-chip micro-network. In DATE '03: Proceedings of the conference on Design, Automation and Test in Europe, page 20070, Washington, DC, USA, 2003. IEEE Computer Society.

Digital Library

[2]

P. Bai et al. A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 cu interconnect layers, low-k ild and 0.57 mu;m2 sram cell. In Electronic Devices Meeting, 2004. IEDM Technical Digest, pages 657--660. IEEE International, Dec 2004.

[3]

A. Chatterjee et al. A 65 nm cmos technology for mobile and digital signal processing applications. In Electronic Devices Meeting, 2004. IEDM Technical Digest, pages 665--668. IEEE International, Dec 2004.

[4]

W. J. Dally and B. Towles. Route packets, not wires: on-chip inteconnectoin networks. In DAC '01: Proceedings of the 38th conference on Design automation, pages 684--689, New York, NY, USA, 2001. ACM Press.

Digital Library

[5]

W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers, 2004.

Digital Library

[6]

J. Duato. A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans. Parallel Distrib. Syst., 4(12):1320--1331, 1993.

Digital Library

[7]

N. Eisley and L.-S. Peh. High-level power analysis for on-chip networks. In CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, pages 104--115, New York, NY, USA, 2004. ACM Press.

Digital Library

[8]

R. Ho, K. Mai, and M. Horowitz. The future of wires. In Proceedings of the IEEE, volume 89, pages 490--504. IEEE, April 2001.

[9]

R. Ho, K. Mai, and M. Horowitz. Managing wire scaling: a circuit perspective. In Proceedings of the IEEE 2003 International Interconnect Technology Conference, pages 177--179, June 2003.

[10]

International technology roadmap for semiconductors. 2005 edition.

[11]

J. Kim, D. Park, T. Theocharides, N. Vijaykrishnan, and C. R. Das. A low latency router supporting adaptivity for on-chip interconnects. In DAC '05: Proceedings of the 42nd annual conference on Design automation, pages 559--564, New York, NY, USA, 2005. ACM Press.

Digital Library

[12]

C. E. Leiserson. Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput., 34(10):892--901, 1985.

Digital Library

[13]

Z. Luo et al. High performance and low power transistors integrated in 65nm bulk cmos technology. In Electronic Devices Meeting, 2004. IEDM Technical Digest, pages 661--664. IEEE International, Dec 2004.

[14]

M. L. Mui, K. Banerjee, and A. Mehrotra. A global interconnect optimization scheme for nanometer scale vlsi with implications for latency, bandwidth, and power dissipation. In IEEE Transactions on Electron Devices, volume 51, pages 195--202. IEEE, February 2004.

[15]

R. Mullins, A. West, and S. Moore. Low-latency virtual-channel routers for on-chip networks. In ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, page 188, Washington, DC, USA, 2004. IEEE Computer Society.

Digital Library

[16]

S. R. Öhring, M. Ibel, S. K. Das, and M. J. Kumar. On generalized fat trees. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, page 37, Washington, DC, USA, 1995. IEEE Computer Society.

Digital Library

[17]

K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. SIGOPS Oper. Syst. Rev., 30(5):2--11, 1996.

Digital Library

[18]

L.-S. Peh and W. J. Dally. A delay model and speculative architecture for pipelined routers. In HPCA '01: Proceedings of the 7th International Symposium on High-Performance Computer Architecture, page 255, Washington, DC, USA, 2001. IEEE Computer Society.

Digital Library

[19]

D. Seo, A. Ali, W.-T. Lim, N. Rafique, and M. Thottethodi. Near-optimal worst-case throughput routing for two-dimensional mesh networks. SIGARCH Comput. Archit. News, 33(2):432--443, 2005.

Digital Library

[20]

M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J.-W. Lee, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(2):25--35, 2002.

Digital Library

[21]

H. Wang, L.-S. Peh, and S. Malik. Power-driven design of router microarchitectures in on-chip networks. In MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, page 105, Washington, DC, USA, 2003. IEEE Computer Society.

Digital Library

[22]

W. Zhao and Y. Cao. New generation of predictive technology model for sub-45nm design exploration. ISQED, 0:585--590, 2006.

Digital Library

Cited By

Li CShi FYin FSoliman KWei J(2024)A High Scalability Memory NoC with Shared-Inside Hierarchical-Groupings for Triplet-Based Many-Core ArchitectureACM Transactions on Architecture and Code Optimization10.1145/3688610Online publication date: 2-Nov-2024
https://doi.org/10.1145/3688610
Cao ZMei BLiu QPei XWan ZWang C(2024)MAHR: A Multi-Application Hybrid Reconfigurable Mechanism for Energy-Efficient Chiplet Interconnection NetworkJournal of Circuits, Systems and Computers10.1142/S0218126625500379Online publication date: 28-Sep-2024
https://doi.org/10.1142/S0218126625500379
Shivdikar KBao YAgrawal RShen MJonatan GMora EIngare ALivesay NAbellÁN JKim JJoshi AKaeli D(2023)GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic EncryptionProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614279(670-684)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614279
Show More Cited By

Index Terms

Design tradeoffs for tiled CMP on-chip networks

Recommendations

Design tradeoffs for tiled CMP on-chip networks
ICS '06: Proceedings of the 20th annual international conference on Supercomputing

We develop detailed area and energy models for on-chip interconnection networks and describe tradeoffs in the design of efficient networks for tiled chip multiprocessors. Using these detailed models we investigate how aspects of the network architecture ...
Author retrospective for design tradeoffs for tiled CMP on-chip networks
ACM International Conference on Supercomputing 25th Anniversary Volume

In the eight years that have passed since we published "Design Tradeoffs for Tiled CMP On-Chip Networks," on-chip interconnection networks have become pervasive, as semiconductor scaling has allowed increasing numbers of processor cores and components ...
An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures

Continuous improvements in integration scale have made possible the inclusion of several processor cores on the same chip. Such designs have been named chip-multiprocessors (or CMPs) and constitute a good alternative to traditional monolithic designs ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ACM International Conference on Supercomputing 25th Anniversary Volume

June 2014

94 pages

ISBN:9781450328401

DOI:10.1145/2591635

Editor:
Utpal Banerjee

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
451
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)8

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li CShi FYin FSoliman KWei J(2024)A High Scalability Memory NoC with Shared-Inside Hierarchical-Groupings for Triplet-Based Many-Core ArchitectureACM Transactions on Architecture and Code Optimization10.1145/3688610Online publication date: 2-Nov-2024
https://doi.org/10.1145/3688610
Cao ZMei BLiu QPei XWan ZWang C(2024)MAHR: A Multi-Application Hybrid Reconfigurable Mechanism for Energy-Efficient Chiplet Interconnection NetworkJournal of Circuits, Systems and Computers10.1142/S0218126625500379Online publication date: 28-Sep-2024
https://doi.org/10.1142/S0218126625500379
Shivdikar KBao YAgrawal RShen MJonatan GMora EIngare ALivesay NAbellÁN JKim JJoshi AKaeli D(2023)GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic EncryptionProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614279(670-684)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614279
Ravi GKrishna TLipasti M(2023)TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire LatencyACM Transactions on Architecture and Code Optimization10.1145/359761120:3(1-25)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3597611
Min DChung YByun IKim JKim JFalsafi BFerdman MLu SWenisch T(2022)CryoWire: wire-driven microarchitecture designs for cryogenic computingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507749(903-917)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507749
Ejaz APapaefstathiou VSourdis I(2020)HighwayNoC: Approaching Ideal NoC Performance With Dual Data Rate RoutersIEEE/ACM Transactions on Networking10.1109/TNET.2020.3034581(1-14)Online publication date: 2020
https://doi.org/10.1109/TNET.2020.3034581
Monakhova EMonakhov ORomanov ALezhnev E(2020)Analytical Routing Algorithm for Networks-on-Chip with the Three-dimensional Circulant Topology2020 Moscow Workshop on Electronic and Networking Technologies (MWENT)10.1109/MWENT47943.2020.9067418(1-6)Online publication date: Mar-2020
https://doi.org/10.1109/MWENT47943.2020.9067418
Asgari BMukhopadhyay SYalamanchili S(2020)MAHASIM: Machine-Learning Hardware Acceleration Using a Software-Defined Intelligent Memory SystemJournal of Signal Processing Systems10.1007/s11265-019-01505-1Online publication date: 28-Feb-2020
https://doi.org/10.1007/s11265-019-01505-1
Shabani HGuo XBogdan PSilvano C(2019)ClusCrossProceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip10.1145/3313231.3352363(1-8)Online publication date: 17-Oct-2019
https://dl.acm.org/doi/10.1145/3313231.3352363
Bashir JPeter ESarangi S(2019)BigBusACM Journal on Emerging Technologies in Computing Systems10.1145/328939115:1(1-24)Online publication date: 28-Jan-2019
https://dl.acm.org/doi/10.1145/3289391
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents