research-article

Extreme Datacenter Specialization for Planet-Scale Computing: ASIC Clouds

Authors:

Scott Davidson,

Moein Khazraee,

Michael B. TaylorAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 52, Issue 1

Pages 96 - 108

https://doi.org/10.1145/3273982.3273991

Published: 28 August 2018 Publication History

Abstract

Planet-scale applications are driving the exponential growth of the cloud, and datacenter specialization is the key enabler of this trend, providing order of magnitudes improvements in cost-effectiveness and energy-efficiency. While exascale computing remains a goal for supercomputing, specialized datacenters have emerged and have demonstrated beyond-exascale performance and efficiency in specific domains. This paper generalizes the applications, design methodology, and deployment challenges of the most extreme form of specialized datacenter: ASIC Clouds. It analyzes two game-changing, real-world ASIC Clouds-Bitcoin Cryptocurrency Clouds and Tensor Processing Clouds-discuss their incentives, the empowering technologies and how they benefit from the specialized ASICs. Their business models, architectures and deployment methods are useful for envisioning future potential ASIC Clouds and forecasting how they will transform computing, the economy and society.

References

[1]

May 8, 2016. ASIC Clouds: Specializing the Datacenter . https://csetechrep.ucsd. edu/Dienst/UI/2.0/Describe/ncstrl.ucsd_cse/CS2016-1016.

[2]

Retrieved 2016. Glassdoor salaries, 2016. https://www.glassdoor.com.

[3]

Retrieved Jun, 2018. Accelerate Genomics Research with the Broad-Intel Genomics Stack. https://www.intel. com/content/dam/www/public/us/en/documents/white-papers/ accelerate-genomics-research-with-the-broad-intel-genomics-stack-paper. pdf.

[4]

Retrieved Jun, 2018. Amazon EC2. https://aws.amazon.com/ec2/.

[5]

Retrieved Jun, 2018. DRAGEN Bio-IT Platform. http://edicogenome.com/ dragen-bioit-platform/.

[6]

Retrieved Jun, 2018. Ethereum Miner pool. https://ethermine.org.

[7]

Retrieved Jun, 2018. Falcon Accelerated Genomics Pipelines. https://aws.amazon. com/marketplace/pp/B07C3NV88G.

[8]

Retrieved Jun, 2018. Litecoin Miner pool. https://www.ltcminer.com.

[9]

Retrieved Jun, 2018. Microsoft Genomics Acceleration. https://www.microsoft. com/en-us/research/project/genomicsacceleration/.

[10]

Retrieved Jun, 2018. OpenCL miner for BitCoin. https://github.com/Diablo-D3/ DiabloMiner/blob/master/src/main/resources/DiabloMiner.cl.

[11]

Retrieved Jun, 2018. Tensorflow CNN Benchmarks. https://github.com/ tensorflow/benchmarks/tree/a03070c016ab33f491ea7962765e378000490d99/ scripts/tf_cnn_benchmarks.

[12]

Junwhan Ahn et al. 2015. A scalable processing-in-memory accelerator for parallel graph processing.

[13]

Jorge Albericio et al. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In International Symposium on Computer Architecture (ISCA).

Digital Library

[14]

XenParavirtOps. https://wiki.xenproject.org/wiki/XenParavirtOps, 2016.

[15]

Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture (2013).

[16]

John Beetem et al. 1985. The GF11 Supercomputer. In International Symposium on Computer Architecture (ISCA).

Digital Library

[17]

Mahdi Nazm Bojnordi et al. 2016. Memristive boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In International Symposium on High Performance Computer Architecture (HPCA).

[18]

J. Adam Butts et al. 2014. The ANTON 2 chip a second-generation ASIC for molecular dynamics. In Hot Chips: A Symposium on High Performance Chips (HOTCHIPS).

[19]

Yunji Chen et al. 2014. DaDianNao: A Machine-Learning Supercomputer. In International Symposium on Microarchitecture (MICRO).

Digital Library

[20]

Yu-Hsin Chen et al. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In International Symposium on Computer Architecture (ISCA).

Digital Library

[21]

Ping Chi et al. 2016. PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In International Symposium on Computer Architecture (ISCA).

Digital Library

[22]

Eric Chung et al. Mar 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro (Mar 2018).

[23]

MartinMDeneroff et al. 2008. Anton: A specialized ASIC for molecular dynamics. In Hot Chips: A Symposium on High Performance Chips (HOTCHIPS).

[24]

Daichi Fuijiki et al. 2018. GenAx: A Genome Sequencing Accelerator. In International Symposium on Computer Architecture (ISCA).

[25]

Boncheol Gu et al. 2016. Biscuit: A framework for near-data processing of big data workloads. In International Symposium on Computer Architecture (ISCA).

Digital Library

[26]

Anthony Gutierrez et al. 2014. Integrated 3D-stacked Server Designs for Increasing Physical Density of Key-value Stores. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[27]

Tae Jun Ham et al. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In International Symposium on Microarchitecture (MICRO).

Digital Library

[28]

Song Han et al. 2016. EIE: efficient inference engine on compressed deep neural network. In International Symposium on Computer Architecture (ISCA).

Digital Library

[29]

Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro (2011).

Digital Library

[30]

Elmar Haubmann. Retrieved Jun, 2018. Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50. https://blog.riseml.com/ comparing-google-tpuv2-against-nvidia-v100-on-resnet-50-c2bbb6a51e5e.

[31]

Yu Ji et al. 2016. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. In International Symposium on Microarchitecture (MICRO).

Digital Library

[32]

H Jones. 2014. Whitepaper: strategies in optimizing market positions for semiconductor vendors based on IP leverage. International Business Strategies. Inc.(IBS). Google Scholar (2014).

[33]

Norman P. Jouppi et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In International Symposium on Computer Architecture (ISCA).

Digital Library

[34]

Chi-Cheng Ju et al. 2015. 18.6 A 0.5 nJ/pixel 4K H. 265/HEVC codec LSI for multiformat smartphone applications. In International Solid-State Circuits Conference (ISSCC).

[35]

Moein Khazraee et al. 2017. Moonwalk: NRE Optimization in ASIC Clouds. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[36]

Moein Khazraee, Luis Vega, Ikuo Magaki, and Michael Taylor. 2017. Specializing a Planet's Computation: ASIC Clouds. IEEE Micro (May 2017).

[37]

Duckhwan Kim et al. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In International Symposium on Computer Architecture (ISCA).

Digital Library

[38]

Onur Kocberber et al. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In International Symposium on Microarchitecture (MICRO).

Digital Library

[39]

Alex Krizhevsky et al. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems.

Digital Library

[40]

Christian Leber et al. 2011. High frequency trading acceleration using FPGAs. In Field Programmable Logic and Applications (FPL).

Digital Library

[41]

Kevin Lim et al. 2013. Thin servers with smart pipes: designing SoC accelerators for memcached. In International Symposium on Computer Architecture (ISCA).

Digital Library

[42]

Shaoli Liu et al. 2016. Cambricon: An instruction set architecture for neural networks. In International Symposium on Computer Architecture (ISCA).

Digital Library

[43]

Ikuo Magaki et al. 2016. ASIC Clouds: Specializing the Datacenter. In International Symposium on Computer Architecture (ISCA).

Digital Library

[44]

Junichiro Makino et al. 2012. GRAPE-8-An accelerator for gravitational N-body simulation with 20.5 Gflops/W performance. In High Performance Computing, Networking, Storage and Analysis (SC).

Digital Library

[45]

Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash system. (2008).

[46]

Courtois Nicolas et al. 2014. Optimizing sha256 in bitcoin mining. In International Conference on Cryptography and Security Systems (CCS).

[47]

Muhammet Mustafa Ozdal et al. 2016. Energy efficient architecture for graph analytics accelerators. In International Symposium on Computer Architecture (ISCA).

Digital Library

[48]

A. Pedram et al. 2016. Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era. IEEE Design and Test (2016).

[49]

Putnam et al. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In International Symposium on Computer Architecture (ISCA).

Digital Library

[50]

Brandon Reagen et al. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In International Symposium on Computer Architecture (ISCA)

Digital Library

[51]

Ali Shafiee et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. International Symposium on Computer Architecture (ISCA).

Digital Library

[52]

Stephen Weston. 2011. FPGA Accelerators at JP Morgan Chase. Stanford Computer Systems Colloquium, https://www.youtube.com/watch?v=9NqX1ETADn0.

[53]

Michael Taylor. 2013. Bitcoin and the Age of Bespoke Silicon. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

Digital Library

[54]

Michael Taylor. 2013. A Landscape of the New Dark Silicon Design Regime. Micro, IEEE (Sept-Oct. 2013).

Digital Library

[55]

Michael B. Taylor. 2012. Is Dark Silicon Useful? Harnessing the Four Horesemen of the Coming Dark Silicon Apocalypse. In DAC.

Digital Library

[56]

Michael Bedford Taylor. 2017. The Evolution of Bitcoin Hardware. Computer 50, 9 (2017), 58-66

Digital Library

[57]

Paul Teich. Retrieved Jun, 2018. TEARING APART GOOGLE'S TPU 3.0 AI COPROCESSOR. https://www.nextplatform.com/2018/05/10/ tearing-apart-googles-tpu-3-0-ai-coprocessor/.

[58]

Yatish Turakhia et al. 2017. Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment. bioRxiv (2017).

[59]

Ganesh Venkatesh et al. 2010. Conservation cores: reducing the energy of mature computations. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[60]

Shijin Zhang et al. 2016. Cambricon-X: An accelerator for sparse neural networks. In International Symposium on Microarchitecture (MICRO).

Digital Library

Cited By

Tan HHuang LZheng ZGuo HYang QShen LChen GXiao LXiao N(2024)A Low-Cost Floating-Point Dot-Product-Dual-Accumulate Architecture for HPC-Enabled AIIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331699443:2(681-693)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TCAD.2023.3316994
Rella L(2023)Close to the metal: Towards a material political economy of the epistemology of computationSocial Studies of Science10.1177/0306312723118509554:1(3-29)Online publication date: 10-Jul-2023
https://doi.org/10.1177/03063127231185095
Bommana ASiddamshetty SPudi DThumatti K. R. ABoppu SSabarimalai Manikandan MCenkeramaddi L(2023)Design of Synthesis-time Vectorized Arithmetic Hardware for Tapered Floating-point Addition and SubtractionACM Transactions on Design Automation of Electronic Systems10.1145/356742328:3(1-35)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3567423
Show More Cited By

Recommendations

Moonwalk: NRE Optimization in ASIC Clouds
Asplos'17

Cloud services are becoming increasingly globalized and data-center workloads are expanding exponentially. GPU and FPGA-based clouds have illustrated improvements in power and performance by accelerating compute-intensive workloads. ASIC-based clouds ...
Moonwalk: NRE Optimization in ASIC Clouds
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems

Cloud services are becoming increasingly globalized and data-center workloads are expanding exponentially. GPU and FPGA-based clouds have illustrated improvements in power and performance by accelerating compute-intensive workloads. ASIC-based clouds ...
Moonwalk: NRE Optimization in ASIC Clouds
ASPLOS '17

Cloud services are becoming increasingly globalized and data-center workloads are expanding exponentially. GPU and FPGA-based clouds have illustrated improvements in power and performance by accelerating compute-intensive workloads. ASIC-based clouds ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 52, Issue 1

Special Topics

July 2018

133 pages

ISSN:0163-5980

DOI:10.1145/3273982

Editors:
Mark Silberstein
Technion, Hafia, Israel
,
Christopher J. Rossbach
Stop D9500, Austin, TX

Issue’s Table of Contents

Copyright © 2018 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 August 2018

Published in SIGOPS Volume 52, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
309
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tan HHuang LZheng ZGuo HYang QShen LChen GXiao LXiao N(2024)A Low-Cost Floating-Point Dot-Product-Dual-Accumulate Architecture for HPC-Enabled AIIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331699443:2(681-693)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TCAD.2023.3316994
Rella L(2023)Close to the metal: Towards a material political economy of the epistemology of computationSocial Studies of Science10.1177/0306312723118509554:1(3-29)Online publication date: 10-Jul-2023
https://doi.org/10.1177/03063127231185095
Bommana ASiddamshetty SPudi DThumatti K. R. ABoppu SSabarimalai Manikandan MCenkeramaddi L(2023)Design of Synthesis-time Vectorized Arithmetic Hardware for Tapered Floating-point Addition and SubtractionACM Transactions on Design Automation of Electronic Systems10.1145/356742328:3(1-35)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3567423
Stathis DChaourani PJafri SHemani A(2023)Clock tree generation by abutment in synchoros VLSI designMicroprocessors & Microsystems10.1016/j.micpro.2023.104913102:COnline publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1016/j.micpro.2023.104913
Zhang DHuda SSonghori EPrabhu KLe QGoldie AMirhoseini AFalsafi BFerdman MLu SWenisch T(2022)A full-stack search technique for domain optimized deep learning acceleratorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507767(27-42)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507767
Arvind TBommana ABoppu S(2022)Floating-Point Hardware Design: A Test Perspective2022 IEEE Silchar Subsection Conference (SILCON)10.1109/SILCON55242.2022.10028826(1-5)Online publication date: 4-Nov-2022
https://doi.org/10.1109/SILCON55242.2022.10028826
Simpson KPezaros D(2022)Revisiting the Classics: Online RL in the Programmable DataplaneNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS54207.2022.9789930(1-10)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1109/NOMS54207.2022.9789930
Bommana ABoppu S(2022)A Run-time Tapered Floating-Point Adder/Subtractor Supporting Vectorization2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC57363.2022.00056(305-312)Online publication date: Dec-2022
https://doi.org/10.1109/MCSoC57363.2022.00056
Ait Said NBenabdenbi MMorin-Allory K(2021)Self-Adaptive Run-Time Variable Floating-Point Precision for Iterative Algorithms: A Joint HW/SW ApproachElectronics10.3390/electronics1018220910:18(2209)Online publication date: 9-Sep-2021
https://doi.org/10.3390/electronics10182209
Mach SSchuiki FZaruba FBenini L(2021)FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision ComputingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2020.304475229:4(774-787)Online publication date: Apr-2021
https://doi.org/10.1109/TVLSI.2020.3044752
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents