Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- posterFebruary 2011
Active pebbles: a programming model for highly parallel fine-grained data-driven computations
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 305–306https://doi.org/10.1145/1941553.1941601A variety of programming models exist to support large-scale, distributed memory, parallel computation. These programming models have historically targeted coarse-grained applications with natural locality such as those found in a variety of scientific ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - posterFebruary 2011
A wait-free NCAS library for parallel applications with timing constraints
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 301–302https://doi.org/10.1145/1941553.1941599We introduce our major ideas of a wait-free, linearizable, and disjoint access parallel NCAS library, called rtNCAS. It focuses the construction of wait-free data structure operations (DSO) in real-time circumstances. rtNCAS is able to conditionally ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - posterFebruary 2011
Two examples of parallel programming without concurrency constructs (PP-CC)
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 299–300https://doi.org/10.1145/1941553.1941598Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - posterFebruary 2011
Evaluating graph coloring on GPUs
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 297–298https://doi.org/10.1145/1941553.1941597This paper evaluates features of graph coloring algorithms implemented on graphics processing units (GPUs), comparing coloring heuristics and thread decompositions. As compared to prior work on graph coloring for other parallel architectures, we find ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - posterFebruary 2011
Time skewing made simple
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 295–296https://doi.org/10.1145/1941553.1941596Time skewing and loop tiling has been known for a long time to be a highly beneficial acceleration technique for nested loops especially on bandwidth hungry multi-core processors, but it is little used in practice because efficient implementations ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 -
- posterFebruary 2011
Kremlin: like gprof, but for parallelization
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 293–294https://doi.org/10.1145/1941553.1941595This paper overviews Kremlin, a software profiling tool designed to assist the parallelization of serial programs. Kremlin accepts a serial source code, profiles it, and provides a list of regions that should be considered in parallelization. Unlike a ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - posterFebruary 2011
Weak atomicity under the x86 memory consistency model
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 291–292https://doi.org/10.1145/1941553.1941594We consider the problem of building a weakly atomic Software Transactional Memory (STM), that provides Single (Global) Lock Atomicity (SLA) while adhering to the x86 memory consistency model (x86-MM).
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
Achieving a single compute device image in OpenCL for multiple GPUs
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 277–288https://doi.org/10.1145/1941553.1941591In this paper, we propose an OpenCL framework that combines multiple GPUs and treats them as a single compute device. Providing a single virtual compute device image to the user makes an OpenCL application written for a single GPU portable to the ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
Accelerating CUDA graph algorithms at maximum warp
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 267–276https://doi.org/10.1145/1941553.1941590Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
The STAPL parallel container framework
- Gabriel Tanase,
- Antal Buss,
- Adam Fidel,
- Harshvardhan,
- Ioannis Papadopoulos,
- Olga Pearce,
- Timmie Smith,
- Nathan Thomas,
- Xiabing Xu,
- Nedal Mourad,
- Jeremy Vu,
- Mauro Bianco,
- Nancy M. Amato,
- Lawrence Rauchwerger
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 235–246https://doi.org/10.1145/1941553.1941586The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C++ with support for parallelism. It includes a collection of distributed data structures called pContainers that are thread-safe, concurrent ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
Lifeline-based global load balancing
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 201–212https://doi.org/10.1145/1941553.1941582On shared-memory systems, Cilk-style work-stealing has been used to effectively parallelize irregular task-graph based applications such as Unbalanced Tree Search (UTS). There are two main difficulties in extending this approach to distributed memory. ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
Lock-free and scalable multi-version software transactional memory
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 179–188https://doi.org/10.1145/1941553.1941579Software Transactional Memory (STM) was initially proposed as a lock-free mechanism for concurrency control. Early implementations had efficiency limitations, and soon obstruction-free proposals appeared, to tackle this problem, often simplifying STM ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
Transaction communicators: enabling cooperation among concurrent transactions
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 169–178https://doi.org/10.1145/1941553.1941578In this paper, we propose to extend transactional memory with transaction communicators, special objects through which concurrent transactions can communicate: changes by one transaction to a communicator can be seen by concurrent transactions before ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
Communicating memory transactions
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 157–168https://doi.org/10.1145/1941553.1941577Many concurrent programming models enable both transactional memory and message passing. For such models, researchers have built increasingly efficient implementations and defined reasonable correctness criteria, while it remains an open problem to ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
Thread contracts for safe parallelism
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 125–134https://doi.org/10.1145/1941553.1941573We build a framework of thread contracts, called Accord, that allows programmers to annotate their concurrency co-ordination strategies. Accord annotations allow programmers to declaratively specify the parts of memory that a thread may read or write ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
ScalaExtrap: trace-based communication extrapolation for spmd programs
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 113–122https://doi.org/10.1145/1941553.1941569Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
ULCC: a user-level facility for optimizing shared cache performance on multicores
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 103–112https://doi.org/10.1145/1941553.1941568Scientific applications face serious performance challenges on multicore processors, one of which is caused by access contention in last level shared caches from multiple running threads. The contention increases the number of long latency memory ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
OoOJava: software out-of-order execution
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 57–68https://doi.org/10.1145/1941553.1941563Developing parallel software using current tools can be challenging. Even experts find it difficult to reason about the use of locks and often accidentally introduce race conditions and deadlocks into parallel software. OoOJava is a compiler-assisted ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
Copperhead: compiling an embedded data parallel language
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 47–56https://doi.org/10.1145/1941553.1941562Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8 - research-articleFebruary 2011
A domain-specific approach to heterogeneous parallelism
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPages 35–46https://doi.org/10.1145/1941553.1941561Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8