Keyword: stencil computation : Search

research-article

Open Access

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores

PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 333–347https://doi.org/10.1145/3627535.3638476

Tensor Core Unit (TCU) is increasingly integrated into modern high-performance processors to enhance matrix multiplication performance. However, constrained to its over-specification, its potential for improving other critical scientific operations like ...

research-article

SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation

ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 463–476https://doi.org/10.1145/3577193.3593719

Fast and accurate climate simulations and weather predictions are critical for understanding and preparing for the impact of climate change. Real-world climate and weather simulations involve the use of complex compound stencil kernels, which are ...

research-article

Open Access

A Scalable Many-core Overlay Architecture on an HBM2-enabled Multi-Die FPGA

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 16, Issue 1Article No.: 15, Pages 1–33https://doi.org/10.1145/3547657

The overlay architecture enables to raise the abstraction level of hardware design and enhances hardware-accelerated applications’ portability. In FPGAs, there is a growing awareness of the overlay structure as typified by many-core architecture. It works ...

research-article

Scalable distributed high-order stencil computations

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 30, Pages 1–13

Stencil computations lie at the heart of many scientific and industrial applications. Stencil algorithms pose several challenges on machines with cache based memory hierarchy, due to low re-use of memory accesses if special care is not taken to optimize ...

research-article

Toward accelerated stencil computation by adapting tensor core unit on GPU

ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 28, Pages 1–12https://doi.org/10.1145/3524059.3532392

The Tensor Core Unit (TCU) has been increasingly adopted on modern high performance processors, specialized in boosting the performance of general matrix multiplication (GEMM). Due to its highly optimized hardware design, TCU can significantly ...

research-article

Open Access

Temporal vectorization for stencils

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 82, Pages 1–13https://doi.org/10.1145/3458817.3476149

Stencil computations represent a very common class of nested loops in scientific and engineering applications. Exploiting vector units in modern CPUs is crucial to achieving peak performance. Previous vectorization approaches often consider the data ...

research-article

Enhancing the Scalability of Multi-FPGA Stencil Computations via Highly Optimized HDL Components

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 14, Issue 3Article No.: 15, Pages 1–33https://doi.org/10.1145/3461478

Stencil-based algorithms are a relevant class of computational kernels in high-performance systems, as they appear in a plethora of fields, from image processing to seismic simulations, from numerical methods to physical modeling. Among the various ...

research-article

Public Access

Fast Stencil Computations using Fast Fourier Transforms

SPAA '21: Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and ArchitecturesPages 8–21https://doi.org/10.1145/3409964.3461803

Stencil computations are widely used to simulate the change of state of physical systems across a multidimensional grid over multiple timesteps. The state-of-the-art techniques in this area fall into three groups: cache-aware tiled looping algorithms, ...

research-article

High-level hardware feature extraction for GPU performance prediction of stencils

GPGPU '20: Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing UnitPages 21–30https://doi.org/10.1145/3366428.3380769

High-level functional programming abstractions have started to show promising results for HPC (High-Performance Computing). Approaches such as Lift, Futhark or Delite have shown that it is possible to have both, high-level abstractions and performance, ...

research-article

Public Access

Automatically translating image processing libraries to halide

ACM Transactions on Graphics (TOG), Volume 38, Issue 6Article No.: 204, Pages 1–13https://doi.org/10.1145/3355089.3356549

This paper presents Dexter, a new tool that automatically translates image processing functions from a low-level general-purpose language to a high-level domain-specific language (DSL), allowing them to leverage cross-platform optimizations enabled by ...

research-article

PIMS: a lightweight processing-in-memory accelerator for stencil computations

MEMSYS '19: Proceedings of the International Symposium on Memory SystemsPages 41–52https://doi.org/10.1145/3357526.3357550

Stencil computation is a classic computational kernel present in many high-performance scientific applications, like image processing and partial differential equation solvers (PDE). A stencil computation sweeps over a multi-dimensional grid and ...

research-article

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model

ICPP '18: Proceedings of the 47th International Conference on Parallel ProcessingArticle No.: 12, Pages 1–10https://doi.org/10.1145/3225058.3225140

Dynamical core is one of the most time-consuming parts in the global atmospheric general circulation model, which is widely used for the numerical simulation of the dynamic evolution process of global atmosphere. Due to its complicated calculation ...

research-article

Extreme-scale realistic stencil computations on sunway taihulight with ten million cores

CCGrid '18: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid ComputingPages 566–571https://doi.org/10.1109/CCGRID.2018.00086

Stencil computation arises from a large variety of scientific and engineering applications and often plays a critical role in the performance of extreme-scale simulations. Due to the memory bound nature, it is a challenging task to optimize stencil ...

poster

Public Access

An Optimal Microarchitecture for Stencil Computation with Data Reuse and Fine-Grained Parallelism: (Abstract Only)

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPage 286https://doi.org/10.1145/3174243.3174964

Stencil computation is one of the most important kernels for many applications such as image processing, solving partial differential equations, and cellular automata. Nevertheless, implementing a high throughput stencil kernel is not trivial due to its ...

research-article

Tessellating stencils

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 49, Pages 1–13https://doi.org/10.1145/3126908.3126920

Stencil computations represent a very common class of nested loops in scientific and engineering applications. The exhaustively studied tiling is one of the most powerful transformation techniques to explore the data locality and parallelism. Unlike ...

article

TOAST: Automatic tiling for iterative stencil computations on GPUs

Concurrency and Computation: Practice & Experience (CCOMP), Volume 29, Issue 8Page n/ahttps://doi.org/10.1002/cpe.4053

The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on graphics ...

research-article

Scalable Failure Masking for Stencil Computations using Ghost Region Expansion and Cell to Rank Remapping

SIAM Journal on Scientific Computing (SISC), Volume 39, Issue 5Pages S347–S378https://doi.org/10.1137/16M1081610

In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple ...

research-article

Reducing overhead in the Uintah framework to support short-lived tasks on GPU-heterogeneous architectures

WOLFHPC '15: Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance ComputingArticle No.: 4, Pages 1–8https://doi.org/10.1145/2830018.2830023

The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. The ...

research-article

Open Access

Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationPages 391–402https://doi.org/10.1145/2737924.2737974

Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 ...

Also Published in:

ACM SIGPLAN Notices: Volume 50 Issue 6

research-article

Trace-Driven Memory Access Pattern Recognition in Computational Kernels

WOSC '14: Proceedings of the Second Workshop on Optimizing Stencil ComputationsPages 25–32https://doi.org/10.1145/2686745.2686748

Classifying memory access patterns is paramount to the selection of the right set of optimizations and determination of the parallelization strategy. Static analyses suffer from ambiguities present in source code, which modern compilation techniques, ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences

Also Published in: