skip to main content
10.1145/2751205.2751229acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Leveraging Silicon-Photonic NoC for Designing Scalable GPUs

Published: 08 June 2015 Publication History

Abstract

Silicon-photonic link technology promises to satisfy the growing need for high bandwidth, low-latency and energy-efficient network-on-chip (NoC) architectures. While silicon-photonic NoC designs have been extensively studied for future many-core systems, their use in massively-threaded GPUs has received little attention to date. In this paper, we first analyze an electrical NoC which connects different cache levels (L1 to L2) in a contemporary GPU memory hierarchy. Evaluating workloads from the AMD SDK run on the Multi2sim GPU simulator finds that, apart from limits in memory bandwidth, an electrical NoC can significantly hamper performance and impede scalability, especially as the number of compute units grows in future GPU systems.
To address this issue, we advocate using silicon-photonic link technology for on-chip communication in GPUs, and we present the first GPU-specific analysis of a cost-effective hybrid photonic crossbar NoC. Our baseline is based on an AMD Southern Islands GPU with 32 compute units (CUs) and we compare this design to our proposed hybrid silicon-photonic NoC. Our proposed photonic hybrid NoC increases performance by up to 6 x (2.7 x on average) and reduces the energy-delay2 product (ED2P) by up to 99% (79% on average) as compared to conventional electrical crossbars. For future GPU systems, we study an electrical 2D-mesh topology since it scales better than an electrical crossbar. For a 128-CU GPU, the proposed hybrid silicon-photonic NoC can improve performance by up to 1.9 x (43% on average) and achieve up to 62% reduction in ED2P (3% on average) in comparison to mesh design with best performance.

References

[1]
AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). http://developer.amd.com/sdks/amdappsdk/.
[2]
Predictive Technology Model. http://ptm.asu.edu/.
[3]
AMD Graphics Cores Next (GCN) Architecture, June 2012. White paper.
[4]
NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110, 2012. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.
[5]
A. Bakhoda et al. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In Proc. Int'l Symposium on Performance Analysis of Systems and Software, April 2009.
[6]
A. Bakhoda, J. Kim, and T. M. Aamodt. On-Chip Network Design Considerations for Compute Accelerators. In Proc. of the 19th Int'l Conference on Parallel Architectures and Compilation Techniques, Sept. 2010.
[7]
C. Batten et al. Building manycore processor-to-dram networks with monolithic silicon photonics. In High Performance Interconnects, 2008. HOTI'08. 16th IEEE Symposium on. IEEE, 2008.
[8]
C. Chen and A. Joshi. Runtime management of laser power in silicon-photonic multibus noc architecture. IEEE Journal of Selected Topics in Quantum Electronics, 19(2):338--350, 2013.
[9]
X. Chen et al. Adaptive cache management for energy-efficient gpu computing. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 2014.
[10]
M. J. Cianchetti, J. C. Kerekes, and D. H. Albonesi. Phastlane: A Rapid Transit Optical Routing Network. SIGARCH Computer Architecture News, 37(3), June 2009.
[11]
W. Dally. Virtual-Channel Flow Control. IEEE Transactions on Parallel and Distributed Systems, 3(2), March 1992.
[12]
B. R. Gaster, L. W. Howes, D. R. Kaeli, P. Mistry, and D. Schaa. Heterogeneous Computing with OpenCL - Revised OpenCL 1.2 Edition, volume 2. Morgan Kaufmann, 2013.
[13]
M. Georgas et al. A Monolithically-Integrated Optical Receiver in Standard 45-nm SOI. IEEE Journal of Solid-State Circuits, 47, July 2012.
[14]
N. Goswami, Z. Li, R. Shankar, and T. Li. Exploring silicon nanophotonics in throughput architecture. Design & Test, IEEE, 31(5):18--27, 2014.
[15]
H. Gu, J. Xu, and W. Zhang. A low-power fat tree-based optical network-on-chip for multiprocessor system-on-chip. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '09, pages 3--8, 2009.
[16]
A. Joshi et al. Silicon-Photonic Clos Networks for Global On-Chip Communication. In 3rd AMC/IEEE Int'l Symposium on Networks on Chip, May 2009.
[17]
N. Kirman and J. F. Martínez. A Power-efficient All-Optical On-Chip Interconnect Using Wavelength-Based Oblivious Routing. In Proc. of the 15th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2010.
[18]
J. Lee et al. Design Space Exploration of On-chip Ring Interconnection for a CPU-GPU Architecture. Journal of Parallel and Distributed Computing, 73(12), Dec. 2012.
[19]
X. Liang, K. Turgay, and D. Brooks. Architectural power models for sram and cam structures based on hybrid analytical/empirical techniques. In Proc. of the Int'l Conference on Computer Aided Design, 2007.
[20]
E. Lindholm et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 28(2), March 2008.
[21]
L. Mah. The AMD GCN Architecture, a Crash Course. AMD Fusion Developer Summit, 2013.
[22]
M. Mantor. Amd hd7970 graphics core next (gcn) architecture. In HOT Chips, A Symposium on High Performance Chips, 2012.
[23]
M. Mantor and M. Houston. AMD Graphics Core Next: Low-Power High-Performance Graphics and Parallel Compute. AMD Fusion Developer Summit, 2011.
[24]
R. Morris, A. Kodi, and A. Louri. Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance. In Proc. of the 45th Int'l Symposium on Microarchitecture, Dec. 2012.
[25]
B. Moss et al. A 1.23pj/b 2.5gb/s monolithically integrated optical carrier-injection ring modulator and all-digital driver circuit in commercial 45nm soi. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International, pages 126--127, Feb 2013.
[26]
J. S. Orcutt et al. Nanophotonic integration in state-of-the-art cmos foundries. Opt. Express, 19(3):2335--2346, Jan 2011.
[27]
Y. Pan et al. Firefly: Illuminating Future Network-on-chip with Nanophotonics. SIGARCH Computuer Architecture News, 37(3), June 2009.
[28]
S. Park et al. Approaching the Theoretical Limits of a Mesh NoC with a 16-Node Chip Prototype in 45nm SOI. In Proc. of the 49th Design Automation Conference, June 2012.
[29]
J. Psota et al. ATAC: Improving Performance and Programmability with On-Chip Optical Networks. In Proc. Int'l Symposium on Circuits and Systems, 2010.
[30]
A. Shacham, K. Bergman, and L. P. Carloni. On the design of a photonic network-on-chip. In Proceedings of the First International Symposium on Networks-on-Chip, NOCS '07, pages 53--64, 2007.
[31]
R. Ubal et al. Multi2Sim: A Simulation Framework for CPU-GPU Computing. In Proc. of the 21st Int'l Conference on Parallel Architectures and Compilation Techniques, Sept. 2012.
[32]
A. N. Udipi et al. Combining Memory and a Controller with Photonics Through 3D-Stacking to Enable Scalable and Energy-Efficient Systems. In Proc. of the 38th Int'l Symposium on Computer Architecture, June 2011.
[33]
S. R. Vangal et al. An 80-Tile Sub-100W TeraFLOPS Processor in 65nm CMOS. IEEE Journal of Solid-State Circuits, 43(1), Jan. 2008.
[34]
D. Vantrease et al. Corona: System Implications of Emerging Nanophotonic Technology. In Proc. of the 35th Int'l Symposium on Computer Architecture, June 2008.
[35]
D. Vantrease et al. Light speed arbitration and flow control for nanophotonic interconnects. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pages 304--315. IEEE, 2009.
[36]
H. Wang, L.-S. Peh, and S. Malik. Power-Driven Design of Router Microarchitectures in On-Chip Networks. In Proc. of the 36th Int'l Symposium on Microarchitecture, 2003.
[37]
D. Wentzlaff et al. On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro, 27(5), Sept. 2007.
[38]
X. Zhang and A. Louri. A Multilayer Nanophotonic Interconnection Network for On-Chip Many-Core Communications. In Proc. of the 47th Design Automation Conference, June 2010.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
June 2015
446 pages
ISBN:9781450335591
DOI:10.1145/2751205
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. gpus
  2. network-on-chip
  3. photonics technology

Qualifiers

  • Research-article

Conference

ICS'15
Sponsor:
ICS'15: 2015 International Conference on Supercomputing
June 8 - 11, 2015
California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media