research-article

A many-core architecture for in-memory data processing

Authors:

Sandeep R Agrawal,

Evangelos Vlachos,

Venkatraman Govindaraju,

Venkatanathan Varadarajan,

Cagri Balkesen,

Georgios Giannikis,

Eric SedlarAuthors Info & Claims

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 245 - 258

https://doi.org/10.1145/3123939.3123985

Published: 14 October 2017 Publication History

Abstract

For many years, the highest energy cost in processing has been data movement rather than computation, and energy is the limiting factor in processor design [21]. As the data needed for a single application grows to exabytes [56], there is clearly an opportunity to design a bandwidth-optimized architecture for big data computation by specializing hardware for data movement. We present the Data Processing Unit or DPU, a shared memory many-core that is specifically designed for high bandwidth analytics workloads. The DPU contains a unique Data Movement System (DMS), which provides hardware acceleration for data movement and partitioning operations at the memory controller that is sufficient to keep up with DDR bandwidth. The DPU also provides acceleration for core to core communication via a unique hardware RPC mechanism called the Atomic Transaction Engine. Comparison of a DPU chip fabricated in 40nm with a Xeon processor on a variety of data processing applications shows a 3× - 15× performance per watt advantage.

References

[1]

Daniel J. Abadi, Peter A. Boncz, and Stavros Harizopoulos. 2009. Column-oriented Database Systems. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1664--1665.

Digital Library

[2]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 265--283. http://dl.acm.org/citation.cfm?id=3026877.3026899

Digital Library

[3]

Sandeep R. Agrawal, Christopher M. Dee, and Alvin R. Lebeck. 2016. Exploiting Accelerators for Efficient High Dimensional Similarity Search. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, Article 3, 12 pages.

Digital Library

[4]

Sandeep R. Agrawal, Valentin Pistol, Jun Pang, John Tran, David Tarjan, and Alvin R. Lebeck. 2014. Rhythm: Harnessing Data Parallel Hardware for Server Workloads. In Proceedings of the 19^th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 19--34.

Digital Library

[5]

David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. 2009. FAWN: A Fast Array of Wimpy Nodes. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP '09). ACM, New York, NY, USA, 1--14.

Digital Library

[6]

Chad Austin. 2013. SAJSON: Single-Allocation JSON Parser. (2013). https://chadaustin.me/2013/01/single-allocation-json-parser

[7]

C. Bahlmann, B. Haasdonk, and H. Burkhardt. 2002. Online handwriting recognition with support vector machines - a kernel approach. In Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition. 49--54.

Digital Library

[8]

Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multi-core, Main-memory Joins: Sort vs. Hash Revisited. Proc. VLDB Endow. 7, 1 (Sept. 2013), 85--96.

Digital Library

[9]

Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 237--248.

Digital Library

[10]

Luca Becchetti, Carlos Castillo, Debora Donato, Stefano Leonardi, and Ricardo Baeza-Yates. 2006. Using Rank Propagation and Probabilistic Counting for Link-Based Spam Detection. In In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD.

[11]

Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, USA, 117--128.

Digital Library

[12]

Christian Böhm, Stefan Berchtold, and Daniel A. Keim. 2001. Searching in High-dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases. ACM Comput. Surv. 33, 3 (Sept. 2001), 322--373.

Digital Library

[13]

Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International Conference on World Wide Web 7 (WWW7). Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The Netherlands, 107--117. http://dl.acm.org/citation.cfm?id=297805.297827

Digital Library

[14]

L. J. Cao, S. S. Keerthi, Chong-Jin Ong, J. Q. Zhang, U. Periyathamby, Xiu Ju Fu, and H. P. Lee. 2006. Parallel Sequential Minimal Optimization for the Training of Support Vector Machines. Trans. Neur. Netw. 17, 4 (July 2006), 1039--1049.

Digital Library

[15]

A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Y. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. 2016. A cloud-scale acceleration architecture. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.

Digital Library

[16]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1--27:27. Issue 3. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Digital Library

[17]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 609--622.

Digital Library

[18]

Eric S. Chung, John D. Davis, and Jaewon Lee. 2013. LINQits: Big Data on Little Clients. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 261--272.

Digital Library

[19]

Eric S. Chung, James C. Hoe, and Ken Mai. 2011. CoRAM: An In-fabric Memory Architecture for FPGA-based Computing. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '11). ACM, New York, NY, USA, 97--106.

Digital Library

[20]

John Cieslewicz and Kenneth A. Ross. 2007. Adaptive Aggregation on Chip Multiprocessors. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07). VLDB Endowment, 339--350. http://dl.acm.org/citation.cfm?id=1325851.1325893

Digital Library

[21]

William J. Dally. {n. d.}. GPU Computing to Exascale and Beyond. In Plenary keynote, SC '10.

[22]

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In ACM SIGPLAN Notices, Vol. 47. ACM, 37--48.

Digital Library

[23]

Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. Hyper-LogLog: the analysis of a near-optimal cardinality estimation algorithm. In AofA: Analysis of Algorithms (DMTCS Proceedings), Philippe Jacquet (Ed.), Vol. AH. Discrete Mathematics and Theoretical Computer Science, Juan les Pins, France, 137--156. https://hal.inria.fr/hal-00406166

[24]

Sanjay Ghemawat and Paul Menage. {n. d.}. TCMalloc: Thread-Caching Malloc. ({n. d.}). http://goog-perftools.sourceforge.net/doc/tcmalloc.html

[25]

Frédéric Giroire. 2009. Order Statistics and Estimating Cardinalities of Massive Data Sets. Discrete Appl. Math. 157, 2 (Jan. 2009), 406--427.

Digital Library

[26]

G. Grubb, A. Zelinsky, L. Nilsson, and M. Rilbe. 2004. 3D vision sensing for improved pedestrian safety. In IEEE Intelligent Vehicles Symposium, 2004. 19--24.

[27]

Tae Jun Ham, Juan L. Aragón, and Margaret Martonosi. 2015. DeSC: Decoupled Supply-compute Communication Management for Heterogeneous Architectures. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 191--203.

Digital Library

[28]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 243--254.

Digital Library

[29]

Tayler H. Hetherington, Mike O'Connor, and Tor M. Aamodt. 2015. MemcachedGPU: Scaling-up Scale-out Key-value Stores. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC '15). ACM, New York, NY, USA, 43--57.

Digital Library

[30]

Tayler H. Hetherington, Timothy G. Rogers, Lisa Hsu, Mike O'Connor, and Tor M. Aamodt. 2012. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS '12). IEEE Computer Society, Washington, DC, USA, 88--98.

Digital Library

[31]

Vijay Janapa Reddi, Benjamin C. Lee, Trishul Chilimbi, and Kushagra Vaid. 2010. Web Search Using Mobile Cores: Quantifying and Mitigating the Price of Efficiency. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). ACM, New York, NY, USA, 314--325.

Digital Library

[32]

Thorsten Joachims. 1998. Text Categorization with Suport Vector Machines: Learning with Many Relevant Features. In Proceedings of the 10th European Conference on Machine Learning (ECML '98). Springer-Verlag, London, UK, UK, 137--142. http://dl.acm.org/citation.cfm?id=645326.649721

Digital Library

[33]

James A Kahle, Michael N Day, H Peter Hofstee, Charles R Johns, Theodore R Maeurer, and David Shippy. 2005. Introduction to the Cell multiprocessor. IBM journal of Research and Development 49, 4.5 (2005), 589--604.

Digital Library

[34]

G. Kestor, R. Gioiosa, D. J. Kerbyson, and A. Hoisie. 2013. Quantifying the energy cost of data movement in scientific applications. In 2013 IEEE International Symposium on Workload Characterization (IISWC). 56--65.

[35]

Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1378--1389.

Digital Library

[36]

Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the Walkers: Accelerating Index Traversals for In-memory Databases. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 468--479.

Digital Library

[37]

Sanjeev Kumar, Christopher J. Hughes, and Anthony Nguyen. 2007. Carbon: Architectural Support for Fine-grained Parallelism on Chip Multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA '07). ACM, New York, NY, USA, 162--173.

Digital Library

[38]

Willis Lang, Jignesh M. Patel, and Srinath Shankar. 2010. Wimpy Node Clusters: What About Non-wimpy Workloads?. In Proceedings of the Sixth International Workshop on Data Management on New Hardware (DaMoN '10). ACM, New York, NY, USA, 47--55.

Digital Library

[39]

M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/ml

[40]

Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2013. Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 36--47.

Digital Library

[41]

Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J Dally, and Mark Horowitz. 2000. Smart memories: A modular reconfigurable architecture. In Computer Architecture, 2000. Proceedings of the 27th International Symposium on. IEEE, 161--171.

Digital Library

[42]

David Marr and Tomaso Poggio. 1976. Cooperative computation of stereo disparity. In From the Retina to the Neocortex. Springer, 239--243.

[43]

Rene Mueller and Jens Teubner. 2010. FPGAs: A New Point in the Database Design Space. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT '10). ACM, New York, NY, USA, 721--723.

Digital Library

[44]

Don Murray and James J. Little. 2000. Using Real-Time Stereo Vision for Mobile Robot Navigation. Auton. Robots 8, 2 (April 2000), 161--171.

Digital Library

[45]

NVIDIA. {n. d.}. NVIDIA TESLA P100 GPU ACCELERATOR. ({n. d.}). http://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf

[46]

Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jongsoo Park, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, Sergey G Pudov, Vadim O Pirogov, and Pradeep Dubey. 2015. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In International Conference on High Performance Computing. Springer, 48--57.

[47]

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J Abadi, David J DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 165--178.

Digital Library

[48]

Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-Memory Databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 1493--1508.

Digital Library

[49]

Orestis Polychroniou and Kenneth A. Ross. 2014. A Comprehensive Study of Main-memory Partitioning and Its Application to Large-scale Comparison- and Radix-sort. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 755--766.

Digital Library

[50]

Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 13--24. http://dl.acm.org/citation.cfm?id=2665671.2665678

Digital Library

[51]

Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark A. Horowitz. 2013. Convolution Engine: Balancing Efficiency & Flexibility in Specialized Computing. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 24--35.

Digital Library

[52]

Sridhar Ramaswamy, Pablo Tamayo, Ryan Rifkin, Sayan Mukherjee, Chen-Hsiang Yeang, Michael Angelo, Christine Ladd, Michael Reich, Eva Latulippe, Jill P Mesirov, et al. 2001. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences 98, 26 (2001), 15149--15154.

[53]

Valentina Salapura, Tejas Karkhanis, Priya Nagpurkar, and Jose Moreira. 2012. Accelerating Business Analytics Applications. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA '12). IEEE Computer Society, Washington, DC, USA, 1--10.

Digital Library

[54]

Daniel Sanchez, Richard M. Yoo, and Christos Kozyrakis. 2010. Flexible Architectural Support for Fine-grain Scheduling. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY, USA, 311--322.

Digital Library

[55]

A W M Smeulders, M. Worring, S. Santini, A Gupta, and R. Jain. 2000. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22, 12 (Dec. 2000), 1349--1380.

Digital Library

[56]

Zachary D Stephens, Skylar Y. Lee, Faraz Faghri, Roy H Campbell, Chengxiang Zhai, Miles J Efron, Ravishankar Iyer, Michael C Schatz, Saurabh Sinha, and Gene E Robinson. 2015. Big data: astronomical or genomical? PLoS Biol 13, 7 (2015), e1002195.

[57]

Sravanthi Kota Venkata, Ikkjin Ahn, Donghwan Jeon, Anshuman Gupta, Christopher Louie, Saturnino Garcia, Serge Belongie, and Michael Bedford Taylor. 2009. SD-VBS: The San Diego vision benchmark suite. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. IEEE, 55--64.

Digital Library

[58]

Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, Wanling Gao, Zhen Jia, Yingjie Shi, Shujie Zhang, et al. 2014. Bigdatabench: A big data benchmark suite from internet services. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on. IEEE, 488--499.

[59]

Yipeng Wang, Ren Wang, Andrew Herdrich, James Tsai, and Yan Solihin. 2016. CAF: Core to Core Communication Acceleration Framework. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16). ACM, New York, NY, USA, 351--362.

Digital Library

[60]

Wikipedia. {n. d.}. Ajax (programming). ({n. d.}). https://en.wikipedia.org/w/index.php?title=Ajax_(programming)&oldid=770489771

[61]

Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. 2013. Navigating Big Data with High-throughput, Energy-efficient Data Partitioning. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 249--260.

Digital Library

[62]

Lisa Wu, Andrea Lottarini, Timothy K. Paine, Martha A. Kim, and Kenneth A. Ross. 2014. Q100: The Architecture and Design of a Database Processing Unit. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 255--268.

Digital Library

[63]

Shen Yin and Okyay Kaynak. 2015. Big data for modern industry: challenges and trends {point of view}. Proc. IEEE 103, 2 (2015), 143--146.

[64]

Wei Yu, Tiebin Liu, Rodolfo Valdez, Marta Gwinn, and Muin J Khoury. 2010. Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Medical Informatics and Decision Making 10, 1 (2010), 16.

Cited By

Shahroodi TCardoso RWong SBosio AO'Connor IHamdioui S(2024)High-Performance Data Mapping for BNNs on PCM-Based Integrated Photonics2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546687(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546687
Shahroodi TCardoso RZahedi MWong SBosio AO'Connor IHamdioui S(2023)Lightspeed Binary Neural Networks using Optical Phase-Change Materials2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137229(1-2)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10137229
Gonzalez AKolli AKhan SLiu SDadu VKarandikar SChang JAsanovic KRanganathan PSolihin YHeinrich M(2023)Profiling Hyperscale Big Data ProcessingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589082(1-16)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589082
Show More Cited By

Index Terms

A many-core architecture for in-memory data processing
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
    2. Parallel architectures
      1. Multicore architectures

Recommendations

From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
Q100: the architecture and design of a database processing unit
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

In this paper, we propose Database Processing Units, or DPUs, a class of domain-specific database processors that can efficiently handle database applications. As a proof of concept, we present the instruction set architecture, microarchitecture, and ...
System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures

The rapid growth in data yields challenges to process data efficiently using current high-performance server architectures such as big Xeon cores. Furthermore, physical design constraints, such as power and density, have become the dominant limiting ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 2017

850 pages

ISBN:9781450349529

DOI:10.1145/3123939

General Chairs:
Hillery Hunter
IBM Research
,
Jaime Moreno
IBM Research
,
Program Chairs:
Joel Emer
NVIDIA and MIT
,
Daniel Sanchez
MIT

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MICRO-50

Sponsor:

SIGMICRO
IEEE-CS\DATC

MICRO-50: The 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 14 - 18, 2017

Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
1,500
Total Downloads

Downloads (Last 12 months)82
Downloads (Last 6 weeks)9

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shahroodi TCardoso RWong SBosio AO'Connor IHamdioui S(2024)High-Performance Data Mapping for BNNs on PCM-Based Integrated Photonics2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546687(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546687
Shahroodi TCardoso RZahedi MWong SBosio AO'Connor IHamdioui S(2023)Lightspeed Binary Neural Networks using Optical Phase-Change Materials2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137229(1-2)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10137229
Gonzalez AKolli AKhan SLiu SDadu VKarandikar SChang JAsanovic KRanganathan PSolihin YHeinrich M(2023)Profiling Hyperscale Big Data ProcessingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589082(1-16)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589082
Caminal HChronis YWu TPatel JMartínez JSalapura VZahran MChong FTang L(2022)Accelerating database analytic query workloads using an associative processorProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527435(623-637)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527435
Koutsoukos DMüller IMarroquín RKlimovic AAlonso G(2021)ModularisProceedings of the VLDB Endowment10.14778/3484224.348422914:13(3308-3321)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.14778/3484224.3484229
Li MLin ZMeng X(2021)Temporal Optical Neurons For Serial Deep Learning2021 IEEE Photonics Society Summer Topicals Meeting Series (SUM)10.1109/SUM48717.2021.9505925(1-2)Online publication date: Jul-2021
https://doi.org/10.1109/SUM48717.2021.9505925
Dominico Sde Almeida EAlves MMeira J(2021)Performance Analysis of Array Database Systems in Non-Uniform Memory Architecture2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP52278.2021.00034(169-176)Online publication date: Mar-2021
https://doi.org/10.1109/PDP52278.2021.00034
Feldmann JYoungblood NKarpov MGehring HLi XStappers MLe Gallo MFu XLukashchuk ARaja ALiu JWright CSebastian AKippenberg TPernice WBhaskaran H(2021)Parallel convolutional processing using an integrated photonic tensor coreNature10.1038/s41586-020-03070-1589:7840(52-58)Online publication date: 6-Jan-2021
https://doi.org/10.1038/s41586-020-03070-1
Cafarella MDeWitt DGadepally VKepner JKozyrakis CKraska TStonebraker MZaharia M(2021)A Polystore Based Database Operating System (DBOS)Heterogeneous Data Management, Polystores, and Analytics for Healthcare10.1007/978-3-030-71055-2_1(3-24)Online publication date: 4-Mar-2021
https://doi.org/10.1007/978-3-030-71055-2_1
Marquez BHuang CPrucnal PShastri B(2021)Neuromorphic Silicon Photonics for Artificial IntelligenceSilicon Photonics IV10.1007/978-3-030-68222-4_10(417-447)Online publication date: 9-Jun-2021
https://doi.org/10.1007/978-3-030-68222-4_10
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents