Search
Search Results
-
GPU-based butterfly counting
When dealing with large bipartite graphs, butterfly counting is a crucial and time-consuming operation. Graphics processing units (GPUs) are widely...
-
GPU cluster dynamics: insights from Alibaba’s 2023 trace release
In this paper, we present a comprehensive analysis of GPU cluster traces from Alibaba, released in 2023, focusing on understanding the detailed...
-
OpBench: an operator-level GPU benchmark for deep learning
Operators (such as Conv and ReLU) play an important role in deep neural networks. Every neural network is composed of a series of differentiable...
-
GPU Side-Channel Attack Classification for Targeted Secure Shader Mitigation
Graphics processing units (GPUs) provide massively parallel processing capabilities, enabling accelerated computation across diverse applications....
-
Hybridhadoop: CPU-GPU hybrid scheduling in hadoop
As a GPU has become an essential component in high performance computing, it has been attempted by many works to leverage GPU computing in Hadoop....
-
A graph pattern mining framework for large graphs on GPU
Graph pattern mining (GPM) is an important problem in graph processing. There are many parallel frameworks for GPM, many of which suffer from low...
-
MuxFlow: efficient GPU sharing in production-level clusters with more than 10000 GPUs
Large-scale GPU clusters are widely used to speed up both latency-critical (online) and best-effort (offline) deep learning (DL) workloads. However,...
-
Distributed data processing and task scheduling based on GPU parallel computing
Distributed data parallel (DDP) computing ensures data parallelism, enabling execution across several computers. A separate distributed data parallel...
-
Implementation and analysis of GPU algorithms for Vecchia Approximation
Gaussian Processes have become an indispensable part of the spatial statistician’s toolbox but are unsuitable for analyzing large datasets because of...
-
Utilization-prediction-aware energy optimization approach for heterogeneous GPU clusters
Optimizing energy consumption in heterogeneous GPU clusters is of paramount importance to enhance overall system efficiency and reduce operational...
-
Towards GPU-enabled serverless cloud edge platforms for accelerating HEVC video coding
Multimedia streaming has become integral to modern living, reshaping entertainment consumption, information access, and global engagement. The ascent...
-
High throughput acceleration of NIST lightweight authenticated encryption schemes on GPU platform
Authenticated encryption with associated data (AEAD) has become prominent over time because it offers authenticity and confidentiality...
-
GPU-accelerated relaxed graph pattern matching algorithms
Graph pattern matching is widely used in real-world applications, such as social network analysis. Since the traditional subgraph isomorphism is...
-
A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud
This paper introduces an innovative GPU sharing and scheduling method to tackle resource wastage and underutilization in deep learning training jobs....
-
An autotuning approach to select the inter-GPU communication library on heterogeneous systems
In this work, an automatic optimisation approach for parallel routines on multi-GPU systems is presented. Several inter-GPU communication libraries...
-
GPU thread throttling for page-level thrashing reduction via static analysis
Unified virtual memory was introduced in modern GPUs to enable a new programming model for programmers. This method manages memory pages between the...
-
CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL
The performance and energy efficiency offered by heterogeneous systems are highly useful for modern C++ applications, but the technological variety...
-
A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment
Efficient utilization of processors in heterogeneous CPU–GPU systems is crucial for improving overall application performance by reducing workload...
-
Accelerating BERT inference with GPU-efficient exit prediction
BERT is a representative pre-trained language model that has drawn extensive attention for significant improvements in downstream Natural Language...
-
GAPS: GPU-accelerated processing service for SM9
SM9 was established in 2016 as a Chinese official identity-based cryptographic (IBC) standard, and became an ISO standard in 2021. It is well-known...