research-article

Public Access

Energy Efficient Chip-to-Chip Wireless Interconnection for Heterogeneous Architectures

Authors:

Sri Harsha Gade,

M. Meraj Ahmed,

Amlan GangulyAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 24, Issue 5

Article No.: 55, Pages 1 - 27

https://doi.org/10.1145/3340109

Published: 26 July 2019 Publication History

All formats PDF

Abstract

Heterogeneous multichip architectures have gained significant interest in high-performance computing clusters to cater to a wide range of applications. In particular, heterogeneous systems with multiple multicore CPUs, GPUs, and memory have become common to meet application requirements. The shared resources like interconnection network in such systems pose significant challenges due to the diverse traffic requirements of CPUs and GPUs. Especially, the performance and energy consumption of inter-chip communication have remained a major bottleneck due to limitations imposed by off-chip wired links. To overcome these challenges, we propose a wireless interconnection network to provide energy-efficient, high-performance communication in heterogeneous multi-chip systems. Interference-free communication between GPUs and memory modules is achieved through directional wireless links, while omnidirectional wireless interfaces connect cores in the CPUs with other components in the system. Besides providing low-energy, high-bandwidth inter-chip communication, the wireless interconnection scales efficiently with system size to provide high performance across multiple chips. The proposed inter-chip wireless interconnection is evaluated on two system sizes with multiple CPU and multiple GPU chips, along with main memory modules. On a system with 4 CPU and 4 GPU chips, application runtime is sped up by 3.94×, packet energy is reduced by 94.4%, and packet latency is reduced by 58.34% as compared to baseline system with wired inter-chip interconnection.

References

[1]

{n.d.}. AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). Retrieved from http://developer.amd.com/sdks/amdappsdk/.

[2]

Levi Barnes. 2013. Multi-GPU programming. In Proceedings of the GPU Technology Conference. nVIDIA.

[3]

Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.

Digital Library

[4]

M. S. Birrittella, M. Debbage, R. Huggahalli, J. Kunz, T. Lovett, T. Rimmer, K. D. Underwood, and R. C. Zak. 2015. Intel® Omni-path architecture: Enabling scalable, high performance fabrics. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. 1--9.

Digital Library

[5]

M. S. Birrittella, M. Debbage, R. Huggahalli, J. Kunz, T. Lovett, T. Rimmer, K. D. Underwood, and R. C. Zak. 2016. Enabling scalable high-performance systems with the Intel Omni-path architecture. IEEE Micro 36, 4 (Jul. 2016), 38--47.

Digital Library

[6]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC’09). 44--54.

Digital Library

[7]

Ian Cutress. 2017. AMD’s Future in Servers: New 7000-series CPUs Launched and EPYC Analysis. Retrieved from https://www.anandtech.com/show/11551/amds-future-in-servers-new-7000-series-cpus-launched-and-epyc-analysis/2.

[8]

Manish Deo. 2017. Enabling Next-Generation Platforms Using Intel’s 3D System-in-Package Techology. Technical Report. Intel.

[9]

S. H. Gade and S. Deb. 2017. HyWin: Hybrid wireless NoC with sandboxed sub-networks for CPU/GPU architectures. IEEE Trans. Comput. 66, 7 (Jul. 2017), 1145--1158.

Digital Library

[10]

S. H. Gade, S. Garg, and S. Deb. 2017. OFDM based high data rate, fading resilient transceiver for wireless networks-on-chip. In Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17). 483--488.

[11]

Sri Harsha Gade, Shobha Sundar Ram, and Sujay Deb. 2019. Millimeter wave wireless interconnects in deep submicron chips: Challenges and opportunities. Integration 64 (2019), 127--136.

[12]

S. H. Gade, S. S. Rout, M. Sinha, H. K. Mondal, W. Singh, and S. Deb. 2018. A utilization aware robust channel access mechanism for wireless NoCs. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS’18). 1--5.

[13]

S. H. Gade, M. Sinha, S. S. Rout, and S. Deb. 2018. Enabling reliable high throughput on-chip wireless communication for many core architectures. In Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI’18). 591--596.

[14]

Mark Harris. 2013. Unified Memory in CUDA 6. Retrieved from https://devblogs.nvidia.com/unified-memory-in-cuda-6/.

[15]

Intel. 2016. Intel omni-path 4.8 Tbps switch ASIC and platform. In Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS’16). 1--17.

[16]

A. Karkar, T. Mak, K. F. Tong, and A. Yakovlev. 2016. A survey of emerging interconnects for on-chip efficient multicast and broadcast in many-cores. IEEE Circ. Syst. Mag. 16, 1 (Firstquarter 2016), 58--72.

[17]

G. Kim, M. Lee, J. Jeong, and J. Kim. 2014. Multi-GPU system design with memory networks. In Proceedings of the 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 484--495.

Digital Library

[18]

Akhilesh Kumar and Malay Trivedi. 2017. Intel Xeon Scalable Processor Architecture Deep Dive. Retrieved from http://www.primeline-solutions.de/files/intel-xeon-scalable-architecture-deep-dive1.pdf.

[19]

Byung-Jae Kwak, Nah-Oak Song, and L. E. Miller. 2005. Performance analysis of exponential backoff. IEEE/ACM Trans. Netw. 13, 2 (Apr. 2005), 343--355.

Digital Library

[20]

S. Laha, S. Kaya, D. W. Matolak, W. Rayess, D. DiTomaso, and A. Kodi. 2015. A new frontier in ultralow power wireless links: Network-on-chip and chip-to-chip interconnects. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 34, 2 (Feb. 2015), 186--198.

Digital Library

[21]

J. H. Lau. 2011. Evolution, challenge, and outlook of TSV, 3D IC integration and 3d silicon integration. In Proceedings of the 2011 International Symposium on Advanced Packaging Materials (APM’11). 462--488.

[22]

Jason Lawley. 2014. Understanding Performance of PCI Express Systems. Whitepaper. Xilinx.

[23]

L. Li, P. Ton, M. Nagar, and P. Chia. 2017. Reliability challenges in 2.5D and 3D IC integration. In Proceedings of the 2017 IEEE 67th Electronic Components and Technology Conference (ECTC’17). 1504--1509.

[24]

Gabriel H. Loh. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, Los Alamitos, CA, 453--464.

Digital Library

[25]

R. Mahajan, R. Sankman, N. Patel, D. Kim, K. Aygun, Z. Qian, Y. Mekonnen, I. Salama, S. Sharan, D. Iyengar, and D. Mallik. 2016. Embedded multi-die interconnect bridge (EMIB)—A high density, high bandwidth packaging interconnect. In Proceedings of the 2016 IEEE 66th Electronic Components and Technology Conference (ECTC’16). 557--565.

[26]

H. Matsutani, M. Koibuchi, I. Fujiwara, T. Kagami, Y. Take, T. Kuroda, P. Bogdan, R. Marculescu, and H. Amano. 2014. Low-latency wireless 3D NoCs via randomized shortcut chips. In Proceedings of the 2014 Design, Automation Test in Europe Conference Exhibition (DATE’14). 1--6.

Digital Library

[27]

H. K. Mondal, S. H. Gade, S. Kaushik, and S. Deb. 2017. Adaptive multi-voltage scaling with utilization prediction for energy-efficient wireless NoC. IEEE Trans. Sust. Comput. 2, 4 (Oct. 2017), 382--395.

[28]

Hemanta Kumar Mondal, Sri Harsha Gade, Raghav Kishore, and Sujay Deb. 2017. P2NoC: Power- and performance-aware NoC architectures for sustainable computing. Sust. Comput. Inf. Syst. 16 (2017), 25--37.

[29]

J. Nickolls and W. J. Dally. 2010. The GPU computing era. IEEE Micro 30, 2 (Mar. 2010), 56--69.

Digital Library

[30]

nVIDIA. 2017. NVIDIA DGX-1 System Architecture-The Fastest Platform for Deep Learning. nVIDIA. Retrieved from http://www.nvidia.com/dgx1.

[31]

nVIDIA. 2018. NVIDIA NVSWITCH-The World’s Highest Bandwidth On-Node Switch. Retrieved from http://images.nvidia.com/content/pdf/nvswitch-technical-overview.pdf.

[32]

A. Samaiyar, S. S. Ram, and S. Deb. 2014. Millimeter-wave planar log periodic antenna for on-chip wireless interconnects. In Proceedings of the 8th European Conference on Antennas and Propagation (EuCAP’14). 1007--1009.

[33]

M. S. Shamim, M. M. Ahmed, N. Mansoor, and A. Ganguly. 2017. Energy-efficient wireless interconnection framework for multichip systems with in-package memory stacks. In Proceedings of the 2017 30th IEEE International System-on-Chip Conference (SOCC’17). 357--362.

[34]

M. S. Shamim, N. Mansoor, R. S. Narde, V. Kothandapani, A. Ganguly, and J. Venkataraman. 2017. A wireless interconnection framework for seamless inter and intra-chip communication in multichip systems. IEEE Trans. Comput. 66, 3 (Mar. 2017), 389--402.

Digital Library

[35]

R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 335--344.

Digital Library

[36]

Verizon. 2017. State of the Market: Internet of Things 2017. Retrieved from https://www.verizon.com/about/sites/default/files/Verizon-2017-State-of-the-Market-IoT-Report.pdf.

[37]

WikiChip. 2018. Infinity Fabric (IF)—AMD. Retrieved from https://en.wikichip.org/wiki/amd/infinity_fabric.

[38]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 24--36.

Digital Library

[39]

X. Wu, Y. Ye, J. Xu, W. Zhang, W. Liu, M. Nikdast, and X. Wang. 2014. UNION: A unified inter/intrachip optical network for chip multiprocessors. IEEE Trans. VLSI Syst. 22, 5 (May 2014), 1082--1095.

[40]

X. Yu, J. Baylon, P. Wettin, D. Heo, P. P. Pande, and S. Mirabbasi. 2014. Architecture and design of multichannel millimeter-wave wireless NoC. IEEE Des. Test 31, 6 (Dec. 2014), 19--28.

[41]

Xiaowu Zhang, Jong Kai Lin, Sunil Wickramanayaka, Songbai Zhang, Roshan Weerasekera, Rahul Dutta, Ka Fai Chang, King-Jien Chui, Hong Yu Li, David Soon Wee Ho, Liang Ding, Guruprasad Katti, Suryanarayana Bhattacharya, and Dim-Lee Kwong. 2015. Heterogeneous 2.5D integration on through silicon interposer. Appl. Phys. Rev. 2, 2 (2015), 021308. arXiv:https://doi.org/10.1063/1.4921463

Cited By

Abadal SHan CPetrov VGalluccio LAkyildiz IJornet J(2024)Electromagnetic Nanonetworks Beyond 6G: From Wearable and Implantable Networks to On-Chip and Quantum CommunicationIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.339925342:8(2122-2142)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/JSAC.2024.3399253
Alaei MYazdanpanah F(2024)A Survey on Heterogeneous CPU–GPU Architectures and SimulatorsConcurrency and Computation: Practice and Experience10.1002/cpe.831837:1Online publication date: 30-Oct-2024
https://doi.org/10.1002/cpe.8318
Nithya NItapu S(2023)Design Of Low Area Interconnect Architecture for CPU-GPU Network-On-Chips (NoCs)2023 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)10.1109/CONECCT57959.2023.10234778(1-5)Online publication date: 14-Jul-2023
https://doi.org/10.1109/CONECCT57959.2023.10234778
Show More Cited By

Index Terms

Energy Efficient Chip-to-Chip Wireless Interconnection for Heterogeneous Architectures
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
    2. Parallel architectures
      1. Interconnection architectures
2. Networks
  1. Network types
    1. Network on chip

Recommendations

Energy scalability of on-chip interconnection networks
On-chip wireless channel propagation: impact of antenna directionality and placement on channel performance
NOCS '18: Proceedings of the Twelfth IEEE/ACM International Symposium on Networks-on-Chip

Long range, low latency wireless links in Networks-on-Chip (NoCs) have been shown to be the most promising solution to provide high performance intra/inter-chip communication in many core era. Significant advancements have been made in design of both ...
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems
ISCA '10

Scaling trends of logic, memories, and interconnect networks lead towards dense many-core chips. Unfortunately, process yields and reticle sizes limit the scalability of large single-chip systems. Multi-chip systems break free of these areal limits, but ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 24, Issue 5

September 2019

282 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/3339837

Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 26 July 2019

Accepted: 01 June 2019

Revised: 01 April 2019

Received: 01 December 2018

Published in TODAES Volume 24, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,747
Total Downloads

Downloads (Last 12 months)510
Downloads (Last 6 weeks)42

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Abadal SHan CPetrov VGalluccio LAkyildiz IJornet J(2024)Electromagnetic Nanonetworks Beyond 6G: From Wearable and Implantable Networks to On-Chip and Quantum CommunicationIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.339925342:8(2122-2142)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/JSAC.2024.3399253
Alaei MYazdanpanah F(2024)A Survey on Heterogeneous CPU–GPU Architectures and SimulatorsConcurrency and Computation: Practice and Experience10.1002/cpe.831837:1Online publication date: 30-Oct-2024
https://doi.org/10.1002/cpe.8318
Nithya NItapu S(2023)Design Of Low Area Interconnect Architecture for CPU-GPU Network-On-Chips (NoCs)2023 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)10.1109/CONECCT57959.2023.10234778(1-5)Online publication date: 14-Jul-2023
https://doi.org/10.1109/CONECCT57959.2023.10234778
Yazdanpanah F(2023)A low-power WNoC transceiver with a novel energy consumption management scheme for dependable IoT systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.10.010172:C(144-158)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1016/j.jpdc.2022.10.010
Chau MPan CHuang Y(2022)Multi-Chiplet Placement Design for 3D Integration2022 17th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT)10.1109/IMPACT56280.2022.9966719(1-4)Online publication date: 26-Oct-2022
https://doi.org/10.1109/IMPACT56280.2022.9966719
Yazdanpanah FAfsharmazayejani R(2022)A systematic analysis of power saving techniques for wireless network-on-chip architecturesJournal of Systems Architecture10.1016/j.sysarc.2022.102485126(102485)Online publication date: May-2022
https://doi.org/10.1016/j.sysarc.2022.102485
Chau MPan CHuang Y(2021)Die-stacking Placement for Heterogeneous integration Architecture2021 16th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT)10.1109/IMPACT53160.2021.9696848(211-214)Online publication date: 21-Dec-2021
https://doi.org/10.1109/IMPACT53160.2021.9696848

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents