General programming languages

Applied Filters

People

Publications

Conferences

Reproducibility Badges

Publication Date

17 Results for: Book/Issue: PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,815,653 records)|Limit your search to The ACM Full-Text Collection (772,220 records)

Showing 1 - 17of17 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

poster
November 2024
A Polyhedral+Dataflow Intermediate Language for Performance Exploration
- Eddie C. Davis,
- Catherine R.M. Olschanowsky
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 498–499https://doi.org/10.1109/PACT.2019.00064

This poster introduces a compiler intermediate language designed for dataflow optimizations within a polyhedral framework. This intermediate representation describes computations at a high level, defines a set of loop and data transformations that can be ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
Quantifying the Direct Overhead of Virtual Function Calls on Massively Parallel Architectures
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 496–497https://doi.org/10.1109/PACT.2019.00063

Programmable accelerators aim to provide the flexibility of traditional CPUs, with greatly improved performance and energy-efficiency. Arguably, the greatest impediment to the widespread adoption of programmable accelerators, like GPUs, is the software ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
Exploiting Multi-Level Task Dependencies to Prune Redundant Work in Relax-Ordered Task-Parallel Algorithms
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 494–495https://doi.org/10.1109/PACT.2019.00062

Work-efficient task-parallel algorithms enforce ordering between tasks using queuing primitives. Such algorithms offer limited parallelism due to queuing constraints that result in data movement and synchronization bottlenecks. Speculatively relaxing ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 486–487https://doi.org/10.1109/PACT.2019.00058

Asymmetric multicore processors (AMP) are necessary for extracting performance in an era of limited power budget and dark silicon. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
CogR: Exploiting Program Structures for Machine-Learning Based Runtime Solutions
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 484–485https://doi.org/10.1109/PACT.2019.00057

We propose CogR, a machine-learning based runtime solution, that enables efficient and dynamic resource scheduling and performance optimization for high-level programming interfaces on heterogeneous systems. CogR tightly combines the structural ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
Automatic Parallelization Targeting Asynchronous Task-Based Runtimes
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 464–465https://doi.org/10.1109/PACT.2019.00047

In a post-Moore world, asynchronous task-based parallelism has become a popular paradigm for parallel programming. Auto-parallelizing compilers are also an active area of research, promising improved developer productivity and application performance. ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
poster
November 2024
The Performance Impact of Thread Packing on Synchronization-Intensive Applications
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 460–461https://doi.org/10.1109/PACT.2019.00045

Thread packing (TP) is a widely-used technique to improve the efficiency of parallel systems. Despite extensive prior works, relatively little work has been done to investigate its performance inefficiencies. To bridge this gap, we quantify its ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
research-article
November 2024
A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 444–455https://doi.org/10.1109/PACT.2019.00042

Irregular computations are commonly seen in many scientific and engineering domains that use unstructured meshes or sparse matrices. The performance of an irregular application is very dependent upon the dataset. This paper poses the following question: "...
0
4
Metrics
Total Citations0
Total Downloads4
Last 12 Months4
Last 6 weeks4
Get Access
research-article
November 2024
Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 432–443https://doi.org/10.1109/PACT.2019.00041

Optimizing scientific applications on today's accelerator-based high performance computing systems can be challenging, especially when multiple GPUs and CPUs with heterogeneous memories and persistent non-volatile memories are present. An example is ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Artifacts Available
Artifacts Evaluated & Functional
Artifacts Evaluated & Reusable
Results Replicated
Generating Portable High-Performance Code via Multi-Dimensional Homomorphisms
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 353–368https://doi.org/10.1109/PACT.2019.00035

We address a key challenge in programming high-performance applications - achieving portable performance, i.e., the same source code achieves a consistent, high level of performance over the variety of modern parallel processors, including multi-core CPU ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Specialization Opportunities in Graphical Workloads
- Lewis Crawford,
- Michael O'Boyle
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 271–282https://doi.org/10.1109/PACT.2019.00029

Computer games are complex performance-critical graphical applications which require specialized GPU hardware. For this reason, GPU drivers often include many heuristics to help optimize throughput. Recently however, new APIs are emerging which sacrifice ...
0
4
Metrics
Total Citations0
Total Downloads4
Last 12 Months4
Last 6 weeks4
Get Access
research-article
November 2024
HeTM: Transactional Memory for Heterogeneous Systems
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 231–243https://doi.org/10.1109/PACT.2019.00026

Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. However, developing applications ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Forgive-TM: Supporting Lazy Conflict Detection In Eager Hardware Transactional Memory
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 192–204https://doi.org/10.1109/PACT.2019.00023

Commercial hardware transactional memory (TM) systems commonly use coherence messages to detect data conflicts. When a core inside a transaction receives a coherence request for data, it uses this information to determine whether there was a data ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
Artifacts Available
Artifacts Evaluated & Functional
Artifacts Evaluated & Reusable
Results Replicated
Fast Parallel Equivalence Relations in a Datalog Compiler
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 82–96https://doi.org/10.1109/PACT.2019.00015

Modern parallelizing Datalog compilers are employed in industrial applications such as networking and static program analysis. These applications regularly reason about equivalences, e.g., computing bitcoin user groups, fast points-to analyses, and ...
1
2
Metrics
Total Citations1
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
research-article
November 2024
Type-Directed Program Synthesis and Constraint Generation for Library Portability
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 55–67https://doi.org/10.1109/PACT.2019.00013

Fast numerical libraries have been a cornerstone of scientific computing for decades, but this comes at a price. Programs may be tied to vendor specific software ecosystems resulting in polluted, non-portable code. As we enter an era of heterogeneous ...
2
2
Metrics
Total Citations2
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
research-article
November 2024
Artifacts Available
Artifacts Evaluated & Functional
Results Replicated
BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 29–42https://doi.org/10.1109/PACT.2019.00011

OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting ...
0
4
Metrics
Total Citations0
Total Downloads4
Last 12 Months4
Last 6 weeks4
Get Access
research-article
November 2024
Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 15–28https://doi.org/10.1109/PACT.2019.00010

Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model. BSP permits bulk-communication and uses large messages which are supported ...
0
4
Metrics
Total Citations0
Total Downloads4
Last 12 Months4
Last 6 weeks4
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

A Polyhedral+Dataflow Intermediate Language for Performance Exploration

Quantifying the Direct Overhead of Virtual Function Calls on Massively Parallel Architectures

Exploiting Multi-Level Task Dependencies to Prune Redundant Work in Relax-Ordered Task-Parallel Algorithms

A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors

CogR: Exploiting Program Structures for Machine-Learning Based Runtime Solutions

Automatic Parallelization Targeting Asynchronous Task-Based Runtimes

The Performance Impact of Thread Packing on Synchronization-Intensive Applications

A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction

Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer

Generating Portable High-Performance Code via Multi-Dimensional Homomorphisms

Specialization Opportunities in Graphical Workloads

HeTM: Transactional Memory for Heterogeneous Systems

Forgive-TM: Supporting Lazy Conflict Detection In Eager Hardware Transactional Memory

Fast Parallel Equivalence Relations in a Datalog Compiler

Type-Directed Program Synthesis and Constraint Generation for Library Portability

BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads

Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics