skip to main content
research-article
Public Access

Energy Efficient Chip-to-Chip Wireless Interconnection for Heterogeneous Architectures

Published: 26 July 2019 Publication History

Abstract

Heterogeneous multichip architectures have gained significant interest in high-performance computing clusters to cater to a wide range of applications. In particular, heterogeneous systems with multiple multicore CPUs, GPUs, and memory have become common to meet application requirements. The shared resources like interconnection network in such systems pose significant challenges due to the diverse traffic requirements of CPUs and GPUs. Especially, the performance and energy consumption of inter-chip communication have remained a major bottleneck due to limitations imposed by off-chip wired links. To overcome these challenges, we propose a wireless interconnection network to provide energy-efficient, high-performance communication in heterogeneous multi-chip systems. Interference-free communication between GPUs and memory modules is achieved through directional wireless links, while omnidirectional wireless interfaces connect cores in the CPUs with other components in the system. Besides providing low-energy, high-bandwidth inter-chip communication, the wireless interconnection scales efficiently with system size to provide high performance across multiple chips. The proposed inter-chip wireless interconnection is evaluated on two system sizes with multiple CPU and multiple GPU chips, along with main memory modules. On a system with 4 CPU and 4 GPU chips, application runtime is sped up by 3.94×, packet energy is reduced by 94.4%, and packet latency is reduced by 58.34% as compared to baseline system with wired inter-chip interconnection.

References

[1]
{n.d.}. AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). Retrieved from http://developer.amd.com/sdks/amdappsdk/.
[2]
Levi Barnes. 2013. Multi-GPU programming. In Proceedings of the GPU Technology Conference. nVIDIA.
[3]
Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
[4]
M. S. Birrittella, M. Debbage, R. Huggahalli, J. Kunz, T. Lovett, T. Rimmer, K. D. Underwood, and R. C. Zak. 2015. Intel® Omni-path architecture: Enabling scalable, high performance fabrics. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. 1--9.
[5]
M. S. Birrittella, M. Debbage, R. Huggahalli, J. Kunz, T. Lovett, T. Rimmer, K. D. Underwood, and R. C. Zak. 2016. Enabling scalable high-performance systems with the Intel Omni-path architecture. IEEE Micro 36, 4 (Jul. 2016), 38--47.
[6]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC’09). 44--54.
[7]
Ian Cutress. 2017. AMD’s Future in Servers: New 7000-series CPUs Launched and EPYC Analysis. Retrieved from https://www.anandtech.com/show/11551/amds-future-in-servers-new-7000-series-cpus-launched-and-epyc-analysis/2.
[8]
Manish Deo. 2017. Enabling Next-Generation Platforms Using Intel’s 3D System-in-Package Techology. Technical Report. Intel.
[9]
S. H. Gade and S. Deb. 2017. HyWin: Hybrid wireless NoC with sandboxed sub-networks for CPU/GPU architectures. IEEE Trans. Comput. 66, 7 (Jul. 2017), 1145--1158.
[10]
S. H. Gade, S. Garg, and S. Deb. 2017. OFDM based high data rate, fading resilient transceiver for wireless networks-on-chip. In Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17). 483--488.
[11]
Sri Harsha Gade, Shobha Sundar Ram, and Sujay Deb. 2019. Millimeter wave wireless interconnects in deep submicron chips: Challenges and opportunities. Integration 64 (2019), 127--136.
[12]
S. H. Gade, S. S. Rout, M. Sinha, H. K. Mondal, W. Singh, and S. Deb. 2018. A utilization aware robust channel access mechanism for wireless NoCs. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS’18). 1--5.
[13]
S. H. Gade, M. Sinha, S. S. Rout, and S. Deb. 2018. Enabling reliable high throughput on-chip wireless communication for many core architectures. In Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI’18). 591--596.
[14]
Mark Harris. 2013. Unified Memory in CUDA 6. Retrieved from https://devblogs.nvidia.com/unified-memory-in-cuda-6/.
[15]
Intel. 2016. Intel omni-path 4.8 Tbps switch ASIC and platform. In Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS’16). 1--17.
[16]
A. Karkar, T. Mak, K. F. Tong, and A. Yakovlev. 2016. A survey of emerging interconnects for on-chip efficient multicast and broadcast in many-cores. IEEE Circ. Syst. Mag. 16, 1 (Firstquarter 2016), 58--72.
[17]
G. Kim, M. Lee, J. Jeong, and J. Kim. 2014. Multi-GPU system design with memory networks. In Proceedings of the 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 484--495.
[18]
Akhilesh Kumar and Malay Trivedi. 2017. Intel Xeon Scalable Processor Architecture Deep Dive. Retrieved from http://www.primeline-solutions.de/files/intel-xeon-scalable-architecture-deep-dive1.pdf.
[19]
Byung-Jae Kwak, Nah-Oak Song, and L. E. Miller. 2005. Performance analysis of exponential backoff. IEEE/ACM Trans. Netw. 13, 2 (Apr. 2005), 343--355.
[20]
S. Laha, S. Kaya, D. W. Matolak, W. Rayess, D. DiTomaso, and A. Kodi. 2015. A new frontier in ultralow power wireless links: Network-on-chip and chip-to-chip interconnects. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 34, 2 (Feb. 2015), 186--198.
[21]
J. H. Lau. 2011. Evolution, challenge, and outlook of TSV, 3D IC integration and 3d silicon integration. In Proceedings of the 2011 International Symposium on Advanced Packaging Materials (APM’11). 462--488.
[22]
Jason Lawley. 2014. Understanding Performance of PCI Express Systems. Whitepaper. Xilinx.
[23]
L. Li, P. Ton, M. Nagar, and P. Chia. 2017. Reliability challenges in 2.5D and 3D IC integration. In Proceedings of the 2017 IEEE 67th Electronic Components and Technology Conference (ECTC’17). 1504--1509.
[24]
Gabriel H. Loh. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, Los Alamitos, CA, 453--464.
[25]
R. Mahajan, R. Sankman, N. Patel, D. Kim, K. Aygun, Z. Qian, Y. Mekonnen, I. Salama, S. Sharan, D. Iyengar, and D. Mallik. 2016. Embedded multi-die interconnect bridge (EMIB)—A high density, high bandwidth packaging interconnect. In Proceedings of the 2016 IEEE 66th Electronic Components and Technology Conference (ECTC’16). 557--565.
[26]
H. Matsutani, M. Koibuchi, I. Fujiwara, T. Kagami, Y. Take, T. Kuroda, P. Bogdan, R. Marculescu, and H. Amano. 2014. Low-latency wireless 3D NoCs via randomized shortcut chips. In Proceedings of the 2014 Design, Automation Test in Europe Conference Exhibition (DATE’14). 1--6.
[27]
H. K. Mondal, S. H. Gade, S. Kaushik, and S. Deb. 2017. Adaptive multi-voltage scaling with utilization prediction for energy-efficient wireless NoC. IEEE Trans. Sust. Comput. 2, 4 (Oct. 2017), 382--395.
[28]
Hemanta Kumar Mondal, Sri Harsha Gade, Raghav Kishore, and Sujay Deb. 2017. P2NoC: Power- and performance-aware NoC architectures for sustainable computing. Sust. Comput. Inf. Syst. 16 (2017), 25--37.
[29]
J. Nickolls and W. J. Dally. 2010. The GPU computing era. IEEE Micro 30, 2 (Mar. 2010), 56--69.
[30]
nVIDIA. 2017. NVIDIA DGX-1 System Architecture-The Fastest Platform for Deep Learning. nVIDIA. Retrieved from http://www.nvidia.com/dgx1.
[31]
nVIDIA. 2018. NVIDIA NVSWITCH-The World’s Highest Bandwidth On-Node Switch. Retrieved from http://images.nvidia.com/content/pdf/nvswitch-technical-overview.pdf.
[32]
A. Samaiyar, S. S. Ram, and S. Deb. 2014. Millimeter-wave planar log periodic antenna for on-chip wireless interconnects. In Proceedings of the 8th European Conference on Antennas and Propagation (EuCAP’14). 1007--1009.
[33]
M. S. Shamim, M. M. Ahmed, N. Mansoor, and A. Ganguly. 2017. Energy-efficient wireless interconnection framework for multichip systems with in-package memory stacks. In Proceedings of the 2017 30th IEEE International System-on-Chip Conference (SOCC’17). 357--362.
[34]
M. S. Shamim, N. Mansoor, R. S. Narde, V. Kothandapani, A. Ganguly, and J. Venkataraman. 2017. A wireless interconnection framework for seamless inter and intra-chip communication in multichip systems. IEEE Trans. Comput. 66, 3 (Mar. 2017), 389--402.
[35]
R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 335--344.
[36]
Verizon. 2017. State of the Market: Internet of Things 2017. Retrieved from https://www.verizon.com/about/sites/default/files/Verizon-2017-State-of-the-Market-IoT-Report.pdf.
[37]
WikiChip. 2018. Infinity Fabric (IF)—AMD. Retrieved from https://en.wikichip.org/wiki/amd/infinity_fabric.
[38]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 24--36.
[39]
X. Wu, Y. Ye, J. Xu, W. Zhang, W. Liu, M. Nikdast, and X. Wang. 2014. UNION: A unified inter/intrachip optical network for chip multiprocessors. IEEE Trans. VLSI Syst. 22, 5 (May 2014), 1082--1095.
[40]
X. Yu, J. Baylon, P. Wettin, D. Heo, P. P. Pande, and S. Mirabbasi. 2014. Architecture and design of multichannel millimeter-wave wireless NoC. IEEE Des. Test 31, 6 (Dec. 2014), 19--28.
[41]
Xiaowu Zhang, Jong Kai Lin, Sunil Wickramanayaka, Songbai Zhang, Roshan Weerasekera, Rahul Dutta, Ka Fai Chang, King-Jien Chui, Hong Yu Li, David Soon Wee Ho, Liang Ding, Guruprasad Katti, Suryanarayana Bhattacharya, and Dim-Lee Kwong. 2015. Heterogeneous 2.5D integration on through silicon interposer. Appl. Phys. Rev. 2, 2 (2015), 021308. arXiv:https://doi.org/10.1063/1.4921463

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 24, Issue 5
September 2019
282 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3339837
  • Editor:
  • Naehyuck Chang
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 26 July 2019
Accepted: 01 June 2019
Revised: 01 April 2019
Received: 01 December 2018
Published in TODAES Volume 24, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Inter-chip wireless
  2. heterogeneous architectures
  3. multi-chip system

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)510
  • Downloads (Last 6 weeks)42
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media