skip to main content
research-article

Extreme Datacenter Specialization for Planet-Scale Computing: ASIC Clouds

Published: 28 August 2018 Publication History

Abstract

Planet-scale applications are driving the exponential growth of the cloud, and datacenter specialization is the key enabler of this trend, providing order of magnitudes improvements in cost-effectiveness and energy-efficiency. While exascale computing remains a goal for supercomputing, specialized datacenters have emerged and have demonstrated beyond-exascale performance and efficiency in specific domains. This paper generalizes the applications, design methodology, and deployment challenges of the most extreme form of specialized datacenter: ASIC Clouds. It analyzes two game-changing, real-world ASIC Clouds-Bitcoin Cryptocurrency Clouds and Tensor Processing Clouds-discuss their incentives, the empowering technologies and how they benefit from the specialized ASICs. Their business models, architectures and deployment methods are useful for envisioning future potential ASIC Clouds and forecasting how they will transform computing, the economy and society.

References

[1]
May 8, 2016. ASIC Clouds: Specializing the Datacenter . https://csetechrep.ucsd. edu/Dienst/UI/2.0/Describe/ncstrl.ucsd_cse/CS2016-1016.
[2]
Retrieved 2016. Glassdoor salaries, 2016. https://www.glassdoor.com.
[3]
Retrieved Jun, 2018. Accelerate Genomics Research with the Broad-Intel Genomics Stack. https://www.intel. com/content/dam/www/public/us/en/documents/white-papers/ accelerate-genomics-research-with-the-broad-intel-genomics-stack-paper. pdf.
[4]
Retrieved Jun, 2018. Amazon EC2. https://aws.amazon.com/ec2/.
[5]
Retrieved Jun, 2018. DRAGEN Bio-IT Platform. http://edicogenome.com/ dragen-bioit-platform/.
[6]
Retrieved Jun, 2018. Ethereum Miner pool. https://ethermine.org.
[7]
Retrieved Jun, 2018. Falcon Accelerated Genomics Pipelines. https://aws.amazon. com/marketplace/pp/B07C3NV88G.
[8]
Retrieved Jun, 2018. Litecoin Miner pool. https://www.ltcminer.com.
[9]
Retrieved Jun, 2018. Microsoft Genomics Acceleration. https://www.microsoft. com/en-us/research/project/genomicsacceleration/.
[10]
Retrieved Jun, 2018. OpenCL miner for BitCoin. https://github.com/Diablo-D3/ DiabloMiner/blob/master/src/main/resources/DiabloMiner.cl.
[11]
Retrieved Jun, 2018. Tensorflow CNN Benchmarks. https://github.com/ tensorflow/benchmarks/tree/a03070c016ab33f491ea7962765e378000490d99/ scripts/tf_cnn_benchmarks.
[12]
Junwhan Ahn et al. 2015. A scalable processing-in-memory accelerator for parallel graph processing.
[13]
Jorge Albericio et al. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In International Symposium on Computer Architecture (ISCA).
[14]
XenParavirtOps. https://wiki.xenproject.org/wiki/XenParavirtOps, 2016.
[15]
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture (2013).
[16]
John Beetem et al. 1985. The GF11 Supercomputer. In International Symposium on Computer Architecture (ISCA).
[17]
Mahdi Nazm Bojnordi et al. 2016. Memristive boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In International Symposium on High Performance Computer Architecture (HPCA).
[18]
J. Adam Butts et al. 2014. The ANTON 2 chip a second-generation ASIC for molecular dynamics. In Hot Chips: A Symposium on High Performance Chips (HOTCHIPS).
[19]
Yunji Chen et al. 2014. DaDianNao: A Machine-Learning Supercomputer. In International Symposium on Microarchitecture (MICRO).
[20]
Yu-Hsin Chen et al. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In International Symposium on Computer Architecture (ISCA).
[21]
Ping Chi et al. 2016. PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In International Symposium on Computer Architecture (ISCA).
[22]
Eric Chung et al. Mar 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro (Mar 2018).
[23]
MartinMDeneroff et al. 2008. Anton: A specialized ASIC for molecular dynamics. In Hot Chips: A Symposium on High Performance Chips (HOTCHIPS).
[24]
Daichi Fuijiki et al. 2018. GenAx: A Genome Sequencing Accelerator. In International Symposium on Computer Architecture (ISCA).
[25]
Boncheol Gu et al. 2016. Biscuit: A framework for near-data processing of big data workloads. In International Symposium on Computer Architecture (ISCA).
[26]
Anthony Gutierrez et al. 2014. Integrated 3D-stacked Server Designs for Increasing Physical Density of Key-value Stores. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[27]
Tae Jun Ham et al. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In International Symposium on Microarchitecture (MICRO).
[28]
Song Han et al. 2016. EIE: efficient inference engine on compressed deep neural network. In International Symposium on Computer Architecture (ISCA).
[29]
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro (2011).
[30]
Elmar Haubmann. Retrieved Jun, 2018. Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50. https://blog.riseml.com/ comparing-google-tpuv2-against-nvidia-v100-on-resnet-50-c2bbb6a51e5e.
[31]
Yu Ji et al. 2016. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. In International Symposium on Microarchitecture (MICRO).
[32]
H Jones. 2014. Whitepaper: strategies in optimizing market positions for semiconductor vendors based on IP leverage. International Business Strategies. Inc.(IBS). Google Scholar (2014).
[33]
Norman P. Jouppi et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In International Symposium on Computer Architecture (ISCA).
[34]
Chi-Cheng Ju et al. 2015. 18.6 A 0.5 nJ/pixel 4K H. 265/HEVC codec LSI for multiformat smartphone applications. In International Solid-State Circuits Conference (ISSCC).
[35]
Moein Khazraee et al. 2017. Moonwalk: NRE Optimization in ASIC Clouds. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[36]
Moein Khazraee, Luis Vega, Ikuo Magaki, and Michael Taylor. 2017. Specializing a Planet's Computation: ASIC Clouds. IEEE Micro (May 2017).
[37]
Duckhwan Kim et al. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In International Symposium on Computer Architecture (ISCA).
[38]
Onur Kocberber et al. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In International Symposium on Microarchitecture (MICRO).
[39]
Alex Krizhevsky et al. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems.
[40]
Christian Leber et al. 2011. High frequency trading acceleration using FPGAs. In Field Programmable Logic and Applications (FPL).
[41]
Kevin Lim et al. 2013. Thin servers with smart pipes: designing SoC accelerators for memcached. In International Symposium on Computer Architecture (ISCA).
[42]
Shaoli Liu et al. 2016. Cambricon: An instruction set architecture for neural networks. In International Symposium on Computer Architecture (ISCA).
[43]
Ikuo Magaki et al. 2016. ASIC Clouds: Specializing the Datacenter. In International Symposium on Computer Architecture (ISCA).
[44]
Junichiro Makino et al. 2012. GRAPE-8-An accelerator for gravitational N-body simulation with 20.5 Gflops/W performance. In High Performance Computing, Networking, Storage and Analysis (SC).
[45]
Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash system. (2008).
[46]
Courtois Nicolas et al. 2014. Optimizing sha256 in bitcoin mining. In International Conference on Cryptography and Security Systems (CCS).
[47]
Muhammet Mustafa Ozdal et al. 2016. Energy efficient architecture for graph analytics accelerators. In International Symposium on Computer Architecture (ISCA).
[48]
A. Pedram et al. 2016. Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era. IEEE Design and Test (2016).
[49]
Putnam et al. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In International Symposium on Computer Architecture (ISCA).
[50]
Brandon Reagen et al. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In International Symposium on Computer Architecture (ISCA)
[51]
Ali Shafiee et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. International Symposium on Computer Architecture (ISCA).
[52]
Stephen Weston. 2011. FPGA Accelerators at JP Morgan Chase. Stanford Computer Systems Colloquium, https://www.youtube.com/watch?v=9NqX1ETADn0.
[53]
Michael Taylor. 2013. Bitcoin and the Age of Bespoke Silicon. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).
[54]
Michael Taylor. 2013. A Landscape of the New Dark Silicon Design Regime. Micro, IEEE (Sept-Oct. 2013).
[55]
Michael B. Taylor. 2012. Is Dark Silicon Useful? Harnessing the Four Horesemen of the Coming Dark Silicon Apocalypse. In DAC.
[56]
Michael Bedford Taylor. 2017. The Evolution of Bitcoin Hardware. Computer 50, 9 (2017), 58-66
[57]
Paul Teich. Retrieved Jun, 2018. TEARING APART GOOGLE'S TPU 3.0 AI COPROCESSOR. https://www.nextplatform.com/2018/05/10/ tearing-apart-googles-tpu-3-0-ai-coprocessor/.
[58]
Yatish Turakhia et al. 2017. Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment. bioRxiv (2017).
[59]
Ganesh Venkatesh et al. 2010. Conservation cores: reducing the energy of mature computations. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[60]
Shijin Zhang et al. 2016. Cambricon-X: An accelerator for sparse neural networks. In International Symposium on Microarchitecture (MICRO).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 52, Issue 1
Special Topics
July 2018
133 pages
ISSN:0163-5980
DOI:10.1145/3273982
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 August 2018
Published in SIGOPS Volume 52, Issue 1

Check for updates

Author Tags

  1. ASIC
  2. Accelerator
  3. Datacenter

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media