research-article

Leveraging Silicon-Photonic NoC for Designing Scalable GPUs

Authors:

Amir Kavyan Kavyan Ziabari,

Jose L. Abellán,

David KaeliAuthors Info & Claims

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Pages 273 - 282

https://doi.org/10.1145/2751205.2751229

Published: 08 June 2015 Publication History

Abstract

Silicon-photonic link technology promises to satisfy the growing need for high bandwidth, low-latency and energy-efficient network-on-chip (NoC) architectures. While silicon-photonic NoC designs have been extensively studied for future many-core systems, their use in massively-threaded GPUs has received little attention to date. In this paper, we first analyze an electrical NoC which connects different cache levels (L1 to L2) in a contemporary GPU memory hierarchy. Evaluating workloads from the AMD SDK run on the Multi2sim GPU simulator finds that, apart from limits in memory bandwidth, an electrical NoC can significantly hamper performance and impede scalability, especially as the number of compute units grows in future GPU systems.

To address this issue, we advocate using silicon-photonic link technology for on-chip communication in GPUs, and we present the first GPU-specific analysis of a cost-effective hybrid photonic crossbar NoC. Our baseline is based on an AMD Southern Islands GPU with 32 compute units (CUs) and we compare this design to our proposed hybrid silicon-photonic NoC. Our proposed photonic hybrid NoC increases performance by up to 6 x (2.7 x on average) and reduces the energy-delay² product (ED²P) by up to 99% (79% on average) as compared to conventional electrical crossbars. For future GPU systems, we study an electrical 2D-mesh topology since it scales better than an electrical crossbar. For a 128-CU GPU, the proposed hybrid silicon-photonic NoC can improve performance by up to 1.9 x (43% on average) and achieve up to 62% reduction in ED²P (3% on average) in comparison to mesh design with best performance.

References

[1]

AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). http://developer.amd.com/sdks/amdappsdk/.

[2]

Predictive Technology Model. http://ptm.asu.edu/.

[3]

AMD Graphics Cores Next (GCN) Architecture, June 2012. White paper.

[4]

NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110, 2012. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.

[5]

A. Bakhoda et al. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In Proc. Int'l Symposium on Performance Analysis of Systems and Software, April 2009.

[6]

A. Bakhoda, J. Kim, and T. M. Aamodt. On-Chip Network Design Considerations for Compute Accelerators. In Proc. of the 19th Int'l Conference on Parallel Architectures and Compilation Techniques, Sept. 2010.

Digital Library

[7]

C. Batten et al. Building manycore processor-to-dram networks with monolithic silicon photonics. In High Performance Interconnects, 2008. HOTI'08. 16th IEEE Symposium on. IEEE, 2008.

Digital Library

[8]

C. Chen and A. Joshi. Runtime management of laser power in silicon-photonic multibus noc architecture. IEEE Journal of Selected Topics in Quantum Electronics, 19(2):338--350, 2013.

[9]

X. Chen et al. Adaptive cache management for energy-efficient gpu computing. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 2014.

Digital Library

[10]

M. J. Cianchetti, J. C. Kerekes, and D. H. Albonesi. Phastlane: A Rapid Transit Optical Routing Network. SIGARCH Computer Architecture News, 37(3), June 2009.

Digital Library

[11]

W. Dally. Virtual-Channel Flow Control. IEEE Transactions on Parallel and Distributed Systems, 3(2), March 1992.

Digital Library

[12]

B. R. Gaster, L. W. Howes, D. R. Kaeli, P. Mistry, and D. Schaa. Heterogeneous Computing with OpenCL - Revised OpenCL 1.2 Edition, volume 2. Morgan Kaufmann, 2013.

Digital Library

[13]

M. Georgas et al. A Monolithically-Integrated Optical Receiver in Standard 45-nm SOI. IEEE Journal of Solid-State Circuits, 47, July 2012.

[14]

N. Goswami, Z. Li, R. Shankar, and T. Li. Exploring silicon nanophotonics in throughput architecture. Design & Test, IEEE, 31(5):18--27, 2014.

[15]

H. Gu, J. Xu, and W. Zhang. A low-power fat tree-based optical network-on-chip for multiprocessor system-on-chip. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '09, pages 3--8, 2009.

Digital Library

[16]

A. Joshi et al. Silicon-Photonic Clos Networks for Global On-Chip Communication. In 3rd AMC/IEEE Int'l Symposium on Networks on Chip, May 2009.

Digital Library

[17]

N. Kirman and J. F. Martínez. A Power-efficient All-Optical On-Chip Interconnect Using Wavelength-Based Oblivious Routing. In Proc. of the 15th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2010.

Digital Library

[18]

J. Lee et al. Design Space Exploration of On-chip Ring Interconnection for a CPU-GPU Architecture. Journal of Parallel and Distributed Computing, 73(12), Dec. 2012.

Digital Library

[19]

X. Liang, K. Turgay, and D. Brooks. Architectural power models for sram and cam structures based on hybrid analytical/empirical techniques. In Proc. of the Int'l Conference on Computer Aided Design, 2007.

Digital Library

[20]

E. Lindholm et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 28(2), March 2008.

Digital Library

[21]

L. Mah. The AMD GCN Architecture, a Crash Course. AMD Fusion Developer Summit, 2013.

[22]

M. Mantor. Amd hd7970 graphics core next (gcn) architecture. In HOT Chips, A Symposium on High Performance Chips, 2012.

[23]

M. Mantor and M. Houston. AMD Graphics Core Next: Low-Power High-Performance Graphics and Parallel Compute. AMD Fusion Developer Summit, 2011.

[24]

R. Morris, A. Kodi, and A. Louri. Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance. In Proc. of the 45th Int'l Symposium on Microarchitecture, Dec. 2012.

Digital Library

[25]

B. Moss et al. A 1.23pj/b 2.5gb/s monolithically integrated optical carrier-injection ring modulator and all-digital driver circuit in commercial 45nm soi. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International, pages 126--127, Feb 2013.

[26]

J. S. Orcutt et al. Nanophotonic integration in state-of-the-art cmos foundries. Opt. Express, 19(3):2335--2346, Jan 2011.

[27]

Y. Pan et al. Firefly: Illuminating Future Network-on-chip with Nanophotonics. SIGARCH Computuer Architecture News, 37(3), June 2009.

Digital Library

[28]

S. Park et al. Approaching the Theoretical Limits of a Mesh NoC with a 16-Node Chip Prototype in 45nm SOI. In Proc. of the 49th Design Automation Conference, June 2012.

Digital Library

[29]

J. Psota et al. ATAC: Improving Performance and Programmability with On-Chip Optical Networks. In Proc. Int'l Symposium on Circuits and Systems, 2010.

[30]

A. Shacham, K. Bergman, and L. P. Carloni. On the design of a photonic network-on-chip. In Proceedings of the First International Symposium on Networks-on-Chip, NOCS '07, pages 53--64, 2007.

Digital Library

[31]

R. Ubal et al. Multi2Sim: A Simulation Framework for CPU-GPU Computing. In Proc. of the 21st Int'l Conference on Parallel Architectures and Compilation Techniques, Sept. 2012.

Digital Library

[32]

A. N. Udipi et al. Combining Memory and a Controller with Photonics Through 3D-Stacking to Enable Scalable and Energy-Efficient Systems. In Proc. of the 38th Int'l Symposium on Computer Architecture, June 2011.

Digital Library

[33]

S. R. Vangal et al. An 80-Tile Sub-100W TeraFLOPS Processor in 65nm CMOS. IEEE Journal of Solid-State Circuits, 43(1), Jan. 2008.

[34]

D. Vantrease et al. Corona: System Implications of Emerging Nanophotonic Technology. In Proc. of the 35th Int'l Symposium on Computer Architecture, June 2008.

Digital Library

[35]

D. Vantrease et al. Light speed arbitration and flow control for nanophotonic interconnects. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pages 304--315. IEEE, 2009.

Digital Library

[36]

H. Wang, L.-S. Peh, and S. Malik. Power-Driven Design of Router Microarchitectures in On-Chip Networks. In Proc. of the 36th Int'l Symposium on Microarchitecture, 2003.

Digital Library

[37]

D. Wentzlaff et al. On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro, 27(5), Sept. 2007.

Digital Library

[38]

X. Zhang and A. Louri. A Multilayer Nanophotonic Interconnection Network for On-Chip Many-Core Communications. In Proc. of the 47th Design Automation Conference, June 2010.

Digital Library

Cited By

Li YLouri AKaranth A(2023)A Silicon Photonic Multi-DNN Accelerator2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00028(238-249)Online publication date: 21-Oct-2023
https://doi.org/10.1109/PACT58117.2023.00028
Li CJiang FChen SLil XLiu YChen LLi XXu J(2023)RONet: Scaling GPU System with Silicon Photonic Chiplet2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323762(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323762
Li YWang KZheng HLouri AKaranth A(2022)Ascend: A Scalable and Energy-Efficient Deep Neural Network Accelerator With Photonic InterconnectsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.316995369:7(2730-2741)Online publication date: Jul-2022
https://doi.org/10.1109/TCSI.2022.3169953
Show More Cited By

Index Terms

Leveraging Silicon-Photonic NoC for Designing Scalable GPUs
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Buses and high-speed links
  2. Integrated circuits
    1. Interconnect
      1. Metallic interconnect
      2. Photonic and optical interconnect

Recommendations

Silicon-photonic network architectures for scalable, power-efficient multi-chip systems
ISCA '10

Scaling trends of logic, memories, and interconnect networks lead towards dense many-core chips. Unfortunately, process yields and reticle sizes limit the scalability of large single-chip systems. Multi-chip systems break free of these areal limits, but ...
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Scaling trends of logic, memories, and interconnect networks lead towards dense many-core chips. Unfortunately, process yields and reticle sizes limit the scalability of large single-chip systems. Multi-chip systems break free of these areal limits, but ...
Exploring hybrid photonic networks-on-chip foremerging chip multiprocessors
CODES+ISSS '09: Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis

Increasing application complexity and improvements in process technology have today enabled chip multiprocessors (CMPs) with tens to hundreds of cores on a chip. Networks on Chip (NoCs) have emerged as scalable communication fabrics that can support ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

June 2015

446 pages

ISBN:9781450335591

DOI:10.1145/2751205

General Chair:
Laxmi N. Bhuyan
University of California, Riverside
,
Program Chairs:
Fred Chong
University of California, Santa Barbara
,
Vivek Sarkar
Rice University

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS'15

Sponsor:

SIGARCH

ICS'15: 2015 International Conference on Supercomputing

June 8 - 11, 2015

California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
280
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YLouri AKaranth A(2023)A Silicon Photonic Multi-DNN Accelerator2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00028(238-249)Online publication date: 21-Oct-2023
https://doi.org/10.1109/PACT58117.2023.00028
Li CJiang FChen SLil XLiu YChen LLi XXu J(2023)RONet: Scaling GPU System with Silicon Photonic Chiplet2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323762(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323762
Li YWang KZheng HLouri AKaranth A(2022)Ascend: A Scalable and Energy-Efficient Deep Neural Network Accelerator With Photonic InterconnectsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.316995369:7(2730-2741)Online publication date: Jul-2022
https://doi.org/10.1109/TCSI.2022.3169953
Li YLouri AKaranth A(2022)SPACX: Silicon Photonics-based Scalable Chiplet Accelerator for DNN Inference2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00066(831-845)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00066
Dai FChen YHuang ZZhang HZhang HXia C(2022)Comparing the performance of multi-layer perceptron training on electrical and optical network-on-chipsThe Journal of Supercomputing10.1007/s11227-022-04945-y79:10(10725-10746)Online publication date: 23-Nov-2022
https://doi.org/10.1007/s11227-022-04945-y
Sunny FMirza ANikdast MPasricha S(2021)ROBIN: A Robust Optical Binary Neural Network AcceleratorACM Transactions on Embedded Computing Systems10.1145/347698820:5s(1-24)Online publication date: 17-Sep-2021
https://dl.acm.org/doi/10.1145/3476988
Zhang JJung M(2021)Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-ProcessorsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480107(695-708)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480107
Sunny FMirza ANikdast MPasricha S(2021)CrossLight: A Cross-Layer Optimized Silicon Photonic Neural Network Accelerator2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586161(1069-1074)Online publication date: 5-Dec-2021
https://doi.org/10.1109/DAC18074.2021.9586161
Bashir JSarangi S(2020)GPUOPTACM Journal on Emerging Technologies in Computing Systems10.1145/341685017:1(1-26)Online publication date: 22-Sep-2020
https://dl.acm.org/doi/10.1145/3416850
Li YChen L(2020)Accelerated Reply Injection for Removing NoC Bottleneck in GPGPUs2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00013(22-31)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00013
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents