Parallel computing methodologies

Applied Filters

People

Publications

Conferences

Reproducibility Badges

Publication Date

17 Results for: Book/Issue: PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,815,652 records)|Limit your search to The ACM Full-Text Collection (772,220 records)

Showing 1 - 17of17 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

poster
November 2024
Quantifying the Direct Overhead of Virtual Function Calls on Massively Parallel Architectures
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 496–497https://doi.org/10.1109/PACT.2019.00063

Programmable accelerators aim to provide the flexibility of traditional CPUs, with greatly improved performance and energy-efficiency. Arguably, the greatest impediment to the widespread adoption of programmable accelerators, like GPUs, is the software ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
Exploiting Multi-Level Task Dependencies to Prune Redundant Work in Relax-Ordered Task-Parallel Algorithms
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 494–495https://doi.org/10.1109/PACT.2019.00062

Work-efficient task-parallel algorithms enforce ordering between tasks using queuing primitives. Such algorithms offer limited parallelism due to queuing constraints that result in data movement and synchronization bottlenecks. Speculatively relaxing ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 486–487https://doi.org/10.1109/PACT.2019.00058

Asymmetric multicore processors (AMP) are necessary for extracting performance in an era of limited power budget and dark silicon. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
CogR: Exploiting Program Structures for Machine-Learning Based Runtime Solutions
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 484–485https://doi.org/10.1109/PACT.2019.00057

We propose CogR, a machine-learning based runtime solution, that enables efficient and dynamic resource scheduling and performance optimization for high-level programming interfaces on heterogeneous systems. CogR tightly combines the structural ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
Automatic Parallelization Targeting Asynchronous Task-Based Runtimes
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 464–465https://doi.org/10.1109/PACT.2019.00047

In a post-Moore world, asynchronous task-based parallelism has become a popular paradigm for parallel programming. Auto-parallelizing compilers are also an active area of research, promising improved developer productivity and application performance. ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
research-article
November 2024
A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 444–455https://doi.org/10.1109/PACT.2019.00042

Irregular computations are commonly seen in many scientific and engineering domains that use unstructured meshes or sparse matrices. The performance of an irregular application is very dependent upon the dataset. This paper poses the following question: "...
0
4
Metrics
Total Citations0
Total Downloads4
Last 12 Months4
Last 6 weeks4
Get Access
research-article
November 2024
Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 432–443https://doi.org/10.1109/PACT.2019.00041

Optimizing scientific applications on today's accelerator-based high performance computing systems can be challenging, especially when multiple GPUs and CPUs with heterogeneous memories and persistent non-volatile memories are present. An example is ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Artifacts Available
Artifacts Evaluated & Functional
Artifacts Evaluated & Reusable
Results Replicated
Generating Portable High-Performance Code via Multi-Dimensional Homomorphisms
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 353–368https://doi.org/10.1109/PACT.2019.00035

We address a key challenge in programming high-performance applications - achieving portable performance, i.e., the same source code achieves a consistent, high level of performance over the variety of modern parallel processors, including multi-core CPU ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Artifacts Available
Artifacts Evaluated & Functional
Results Replicated
EDGE: Event-Driven GPU Execution
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 336–352https://doi.org/10.1109/PACT.2019.00034

GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for irregular streaming network tasks. However, the GPU's co-processor dependence on a CPU for task ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 323–335https://doi.org/10.1109/PACT.2019.00033

Sparse solvers are heavily used in computational fluid dynamics (CFD), computer-aided design (CAD), and other important application domains. These solvers remain challenging to execute on massively parallel architectures, due to the sequential ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
research-article
November 2024
Analyzing and Leveraging Remote-core Bandwidth for Enhanced Performance in GPUs
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 257–270https://doi.org/10.1109/PACT.2019.00028

Bandwidth achieved from local/shared caches and memory is a major performance determinant in Graphics Processing Units (GPUs). These existing sources of bandwidth are often not enough for optimal GPU performance. Therefore, to enhance the performance ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Achieving scalability in a k-NN multi-GPU network service with Centaur
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 244–256https://doi.org/10.1109/PACT.2019.00027

Centaur is a GPU-centric architecture for building a low-latency approximate k-Nearest-Neighbors network server. We implement a multi-GPU distributed data flow runtime which enables efficient and scalable network request processing on GPUs. The runtime ...
0
4
Metrics
Total Citations0
Total Downloads4
Last 12 Months4
Last 6 weeks4
Get Access
research-article
November 2024
Artifacts Available
Artifacts Evaluated & Functional
Artifacts Evaluated & Reusable
Results Replicated
Unfair Scheduling Patterns in NUMA Architectures
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 205–218https://doi.org/10.1109/PACT.2019.00024

Lock-free algorithms are typically designed and analyzed with adversarial scheduling in mind. However, on real hardware, lock-free algorithms perform much better than the adversarial assumption predicts, suggesting that adversarial scheduling is ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Forgive-TM: Supporting Lazy Conflict Detection In Eager Hardware Transactional Memory
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 192–204https://doi.org/10.1109/PACT.2019.00023

Commercial hardware transactional memory (TM) systems commonly use coherence messages to detect data conflicts. When a core inside a transaction receives a coherence request for data, it uses this information to determine whether there was a data ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Artifacts Available
Artifacts Evaluated & Functional
Artifacts Evaluated & Reusable
Results Replicated
Fast Parallel Equivalence Relations in a Datalog Compiler
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 82–96https://doi.org/10.1109/PACT.2019.00015

Modern parallelizing Datalog compilers are employed in industrial applications such as networking and static program analysis. These applications regularly reason about equivalences, e.g., computing bitcoin user groups, fast points-to analyses, and ...
1
2
Metrics
Total Citations1
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
research-article
November 2024
Artifacts Available
Artifacts Evaluated & Functional
Results Replicated
BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 29–42https://doi.org/10.1109/PACT.2019.00011

OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting ...
0
4
Metrics
Total Citations0
Total Downloads4
Last 12 Months4
Last 6 weeks4
Get Access
research-article
November 2024
Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 15–28https://doi.org/10.1109/PACT.2019.00010

Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model. BSP permits bulk-communication and uses large messages which are supported ...
0
4
Metrics
Total Citations0
Total Downloads4
Last 12 Months4
Last 6 weeks4
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Quantifying the Direct Overhead of Virtual Function Calls on Massively Parallel Architectures

Exploiting Multi-Level Task Dependencies to Prune Redundant Work in Relax-Ordered Task-Parallel Algorithms

A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors

CogR: Exploiting Program Structures for Machine-Learning Based Runtime Solutions

Automatic Parallelization Targeting Asynchronous Task-Based Runtimes

A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction

Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer

Generating Portable High-Performance Code via Multi-Dimensional Homomorphisms

EDGE: Event-Driven GPU Execution

Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs

Analyzing and Leveraging Remote-core Bandwidth for Enhanced Performance in GPUs

Achieving scalability in a k-NN multi-GPU network service with Centaur

Unfair Scheduling Patterns in NUMA Architectures

Forgive-TM: Supporting Lazy Conflict Detection In Eager Hardware Transactional Memory

Fast Parallel Equivalence Relations in a Datalog Compiler

BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads

Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics