Keyword: accelerator architectures : Search

research-article

Not all GPUs are created equal: characterizing variability in large-scale, accelerator-rich systems

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 65, Pages 1–15

Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured ...

research-article

A submatrix-based method for approximate matrix function evaluation in the quantum chemistry code CP2K

SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 80, Pages 1–14

Electronic structure calculations based on density-functional theory (DFT) represent a significant part of today's HPC workloads and pose high demands on high-performance computing resources. To perform these quantum-mechanical DFT calculations on ...

research-article

TOSS-2020: a commodity software stack for HPC

SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 40, Pages 1–15

The simulation environment of any HPC platform is key to the performance, portability, and productivity of scientific applications. This environment has traditionally been provided by platform vendors, presenting challenges for HPC centers and users ...

research-article

Creating an agile hardware design flow

DAC '20: Proceedings of the 57th ACM/EDAC/IEEE Design Automation ConferenceArticle No.: 142, Pages 1–6

Although an agile approach is standard for software design, how to properly adapt this method to hardware is still an open question. This work addresses this question while building a system on chip (SoC) with specialized accelerators. Rather than using ...

research-article

SOFF: an OpenCL high-level synthesis framework for FPGAs

ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer ArchitecturePages 295–308https://doi.org/10.1109/ISCA45697.2020.00034

Recently, OpenCL has been emerging as a programming model for energy-efficient FPGA accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor performance and usability. This paper proposes a high-level synthesis framework ...

research-article

Reducing communication in parallel graph search algorithms with software caches

International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 33, Issue 2Pages 384–396https://doi.org/10.1177/1094342018762510

In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a ...

research-article

Enabling scientific computing on memristive accelerators

ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitecturePages 367–382https://doi.org/10.1109/ISCA.2018.00039

Linear algebra is ubiquitous across virtually every field of science and engineering, from climate modeling to macroeconomics. This ubiquity makes linear algebra a prime candidate for hardware acceleration, which can improve both the run time and the ...

research-article

A configurable cloud-scale DNN processor for real-time AI

ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitecturePages 1–14https://doi.org/10.1109/ISCA.2018.00012

Interactive AI-powered services require low-latency evaluation of deep neural network (DNN) models---aka "realtime AI". The growing demand for computationally expensive, state-of-the-art DNNs, coupled with diminishing performance gains of general-...

research-article

Public Access

Lightweight SIMT core designs for intelligent 3D stacked DRAM

MEMSYS '17: Proceedings of the International Symposium on Memory SystemsPages 49–59https://doi.org/10.1145/3132402.3132426

In this work we present an analysis of the Harmonica stream multiprocessor, a light-weight, parameterized, open-source single-instruction-multiple-thread (SIMT) core designed for integration within 3D-stacked DRAM. We evaluate the range of Harmonica ...

research-article

ePython: an implementation of Python for the many-core Epiphany coprocessor

Nick Brown

PyHPC '16: Proceedings of the 6th Workshop on Python for High-Performance and Scientific ComputingPages 59–66

The Epiphany is a many-core, low power, low on-chip memory architecture and one can very cheaply gain access to a number of parallel cores which is beneficial for HPC education and prototyping. The very low power nature of these architectures also means ...

research-article

Runtime coordinated heterogeneous tasks in charm++

ESPM2: Proceedings of the Second Internationsl Workshop on Extreme Scale Programming Models and MiddlewarePages 40–43

Effective utilization of the increasingly heterogeneous hardware in modern supercomputers is a significant challenge. Many applications have seen performance gains by using GPUs, but many implementations leave CPUs sitting idle.

In this paper, we ...

research-article

Extended task queuing: active messages for heterogeneous systems

SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 80, Pages 1–12

Accelerators have emerged as an important component of modern cloud, datacenter, and HPC computing environments. However, launching tasks on remote accelerators across a network remains unwieldy, forcing programmers to send data in large chunks to ...

research-article

Evaluation of an analog accelerator for linear algebra

ISCA '16: Proceedings of the 43rd International Symposium on Computer ArchitecturePages 570–582https://doi.org/10.1109/ISCA.2016.56

Due to the end of supply voltage scaling and the increasing percentage of dark silicon in modern integrated circuits, researchers are looking for new scalable ways to get useful computation from existing silicon technology. In this paper we present a ...

Also Published in:

ACM SIGARCH Computer Architecture News: Volume 44 Issue 3

Article

Fast and Flexible Conversion of Geohash Codes to and from Latitude/Longitude Coordinates

FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing MachinesPages 179–186https://doi.org/10.1109/FCCM.2015.18

Insights extracted from spatial queries in geodatabase systems introduce significant opportunities for business intelligence. However, geodatabases are unable to keep up with the required performance due to the massive (and sky-rocketing) amounts of ...

research-article

A caching approach to reduce communication in graph search algorithms

DISCS '14: Proceedings of the 2014 International Workshop on Data Intensive Scalable Computing SystemsPages 65–72https://doi.org/10.1109/DISCS.2014.8

In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a ...

Article

Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi™ Coprocessors

IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed ProcessingPages 1085–1097https://doi.org/10.1109/IPDPS.2013.44

We analyse gather-scatter performance bottlenecks in molecular dynamics codes and the challenges that they pose for obtaining benefits from SIMD execution. This analysis informs a number of novel code-level and algorithmic improvements to Sandia's ...

Article

Developing Performance-Portable Molecular Dynamics Kernels in OpenCL

SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and AnalysisPages 386–395https://doi.org/10.1109/SC.Companion.2012.58

This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia's miniMD benchmark that achieves good levels of performance across a wide range of ...

research-article

Probabilistic auto-tuning for architectures with complex constraints

EXADAPT '11: Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop EraPages 22–33https://doi.org/10.1145/2000417.2000420

It is hard to optimize applications for coprocessor accelerator architectures, like FPGAs and GPUs, because application parameters must be tuned carefully to the size of the target architecture. Moreover, some combinations of parameters simply do not ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Caption

Not all GPUs are created equal: characterizing variability in large-scale, accelerator-rich systems

A submatrix-based method for approximate matrix function evaluation in the quantum chemistry code CP2K

TOSS-2020: a commodity software stack for HPC

Creating an agile hardware design flow

SOFF: an OpenCL high-level synthesis framework for FPGAs

Reducing communication in parallel graph search algorithms with software caches

Enabling scientific computing on memristive accelerators

A configurable cloud-scale DNN processor for real-time AI

Lightweight SIMT core designs for intelligent 3D stacked DRAM

ePython: an implementation of Python for the many-core Epiphany coprocessor

Runtime coordinated heterogeneous tasks in charm++

Extended task queuing: active messages for heterogeneous systems

Evaluation of an analog accelerator for linear algebra

Also Published in:

Fast and Flexible Conversion of Geohash Codes to and from Latitude/Longitude Coordinates

A caching approach to reduce communication in graph search algorithms

Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi™ Coprocessors

Developing Performance-Portable Molecular Dynamics Kernels in OpenCL

Probabilistic auto-tuning for architectures with complex constraints

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Also Published in: