Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- posterNovember 2024
A Polyhedral+Dataflow Intermediate Language for Performance Exploration
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 498–499https://doi.org/10.1109/PACT.2019.00064This poster introduces a compiler intermediate language designed for dataflow optimizations within a polyhedral framework. This intermediate representation describes computations at a high level, defines a set of loop and data transformations that can be ...
- posterNovember 2024
Quantifying the Direct Overhead of Virtual Function Calls on Massively Parallel Architectures
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 496–497https://doi.org/10.1109/PACT.2019.00063Programmable accelerators aim to provide the flexibility of traditional CPUs, with greatly improved performance and energy-efficiency. Arguably, the greatest impediment to the widespread adoption of programmable accelerators, like GPUs, is the software ...
- posterNovember 2024
Exploiting Multi-Level Task Dependencies to Prune Redundant Work in Relax-Ordered Task-Parallel Algorithms
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 494–495https://doi.org/10.1109/PACT.2019.00062Work-efficient task-parallel algorithms enforce ordering between tasks using queuing primitives. Such algorithms offer limited parallelism due to queuing constraints that result in data movement and synchronization bottlenecks. Speculatively relaxing ...
- posterNovember 2024
A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 486–487https://doi.org/10.1109/PACT.2019.00058Asymmetric multicore processors (AMP) are necessary for extracting performance in an era of limited power budget and dark silicon. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient ...
- posterNovember 2024
CogR: Exploiting Program Structures for Machine-Learning Based Runtime Solutions
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 484–485https://doi.org/10.1109/PACT.2019.00057We propose CogR, a machine-learning based runtime solution, that enables efficient and dynamic resource scheduling and performance optimization for high-level programming interfaces on heterogeneous systems. CogR tightly combines the structural ...
- posterNovember 2024
Automatic Parallelization Targeting Asynchronous Task-Based Runtimes
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 464–465https://doi.org/10.1109/PACT.2019.00047In a post-Moore world, asynchronous task-based parallelism has become a popular paradigm for parallel programming. Auto-parallelizing compilers are also an active area of research, promising improved developer productivity and application performance. ...
- posterNovember 2024
The Performance Impact of Thread Packing on Synchronization-Intensive Applications
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 460–461https://doi.org/10.1109/PACT.2019.00045Thread packing (TP) is a widely-used technique to improve the efficiency of parallel systems. Despite extensive prior works, relatively little work has been done to investigate its performance inefficiencies. To bridge this gap, we quantify its ...
- research-articleNovember 2024
A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 444–455https://doi.org/10.1109/PACT.2019.00042Irregular computations are commonly seen in many scientific and engineering domains that use unstructured meshes or sparse matrices. The performance of an irregular application is very dependent upon the dataset. This paper poses the following question: "...
- research-articleNovember 2024
Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer
- Giovanni Balduzzi,
- Arghya Chatterjee,
- Ying Wai Li,
- Peter W. Doak,
- Urs Haehner,
- Ed F. D'Azevedo,
- Thomas A. Maier,
- Thomas Schulthess
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 432–443https://doi.org/10.1109/PACT.2019.00041Optimizing scientific applications on today's accelerator-based high performance computing systems can be challenging, especially when multiple GPUs and CPUs with heterogeneous memories and persistent non-volatile memories are present. An example is ...
- research-articleNovember 2024
Generating Portable High-Performance Code via Multi-Dimensional Homomorphisms
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 353–368https://doi.org/10.1109/PACT.2019.00035We address a key challenge in programming high-performance applications - achieving portable performance, i.e., the same source code achieves a consistent, high level of performance over the variety of modern parallel processors, including multi-core CPU ...
- research-articleNovember 2024
Specialization Opportunities in Graphical Workloads
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 271–282https://doi.org/10.1109/PACT.2019.00029Computer games are complex performance-critical graphical applications which require specialized GPU hardware. For this reason, GPU drivers often include many heuristics to help optimize throughput. Recently however, new APIs are emerging which sacrifice ...
- research-articleNovember 2024
HeTM: Transactional Memory for Heterogeneous Systems
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 231–243https://doi.org/10.1109/PACT.2019.00026Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. However, developing applications ...
- research-articleNovember 2024
Forgive-TM: Supporting Lazy Conflict Detection In Eager Hardware Transactional Memory
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 192–204https://doi.org/10.1109/PACT.2019.00023Commercial hardware transactional memory (TM) systems commonly use coherence messages to detect data conflicts. When a core inside a transaction receives a coherence request for data, it uses this information to determine whether there was a data ...
- research-articleNovember 2024
Fast Parallel Equivalence Relations in a Datalog Compiler
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 82–96https://doi.org/10.1109/PACT.2019.00015Modern parallelizing Datalog compilers are employed in industrial applications such as networking and static program analysis. These applications regularly reason about equivalences, e.g., computing bitcoin user groups, fast points-to analyses, and ...
- research-articleNovember 2024
Type-Directed Program Synthesis and Constraint Generation for Library Portability
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 55–67https://doi.org/10.1109/PACT.2019.00013Fast numerical libraries have been a cornerstone of scientific computing for decades, but this comes at a price. Programs may be tied to vendor specific software ecosystems resulting in polluted, non-portable code. As we enter an era of heterogeneous ...
BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 29–42https://doi.org/10.1109/PACT.2019.00011OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting ...
- research-articleNovember 2024
Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics
- Roshan Dathathri,
- Gurbinder Gill,
- Loc Hoang,
- Vishwesh Jatala,
- Keshav Pingali,
- V. Krishna Nandivada,
- Hoang-Vu Dang,
- Marc Snir
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 15–28https://doi.org/10.1109/PACT.2019.00010Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model. BSP permits bulk-communication and uses large messages which are supported ...