Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- posterJanuary 2021
Toward Data-Adaptable TinyML using Model Partial Replacement for Resource Frugal Edge Device
HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific RegionPages 133–135https://doi.org/10.1145/3432261.3439865Demand to perform machine learning (ML) tasks in microcontroller unit (MCU)-based edge devices instead of the server, that have limited resources, is gradually increasing. TinyML framework makes possible that creating ML firmware in a language that can ...
- posterJanuary 2021
HPC LINPACK Parameter Optimization on Homo-/Heterogeneous System of ARM Neoverse N1SDP
HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific RegionPages 139–143https://doi.org/10.1145/3432261.3439864HPL(High Performance Linpack) is the standard benchmark used to evaluate supercomputers (high-performance computing systems) around the world. HPL solves a linear system of equations, Ax=b, through a series of mathematical processes such as 2D Block-...
- posterJanuary 2021
GPU Optimizations for Atmospheric Chemical Kinetics
- Theodoros Christoudias,
- Timo Kirfel,
- Astrid Kerkweg,
- Domenico Taraborrelli,
- Georges-Emmanuel Moulard,
- Erwan Raffin,
- Victor Azizi,
- Gijs van den Oord,
- Ben van Werkhoven
HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific RegionPages 136–138https://doi.org/10.1145/3432261.3439863We present a series of optimizations to alleviate stack memory overflow issues and improve overall performance of GPU computational kernels in atmospheric chemical kinetics model simulations. We use heap memory in numerical solvers for stiff ODEs, move ...
- research-articleJanuary 2021
SeisSol on Distributed Multi-GPU Systems: CUDA Code Generation for the Modal Discontinuous Galerkin Method
HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific RegionPages 69–82https://doi.org/10.1145/3432261.3436753We present a GPU implementation of the high order Discontinuous Galerkin (DG) scheme in SeisSol, a software package for simulating seismic waves and earthquake dynamics. Our particular focus is on providing a performance portable solution for ...
- research-articleJanuary 2021
CSPACER: A Reduced API Set Runtime for the Space Consistency Model
HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific RegionPages 58–68https://doi.org/10.1145/3432261.3432272We present our design and implementation of a runtime for the Space Consistency model. The Space Consistency model is a generalized form of the full-empty bit synchronization for distributed memory programming, where a memory region is associated with a ...
- research-articleJanuary 2021
A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication
HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific RegionPages 110–119https://doi.org/10.1145/3432261.3432271Matrix-matrix multiplication (GEMM) is a widely used linear algebra primitive common in scientific computing and data sciences. While several highly-tuned libraries and implementations exist, these typically target either sparse or dense matrices. The ...
- research-articleJanuary 2021
Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme
HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific RegionPages 100–109https://doi.org/10.1145/3432261.3432270On Krylov subspace methods such as the Conjugate Gradient (CG) method, the number of iterations until convergence may increase due to the loss of computational accuracy caused by rounding errors in floating-point computations. At the same time, because ...