Software notations and tools

Applied Filters

People

Publications

Conferences

Publication Date

30 Results for: Book/Issue: PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,815,682 records)|Limit your search to The ACM Full-Text Collection (772,231 records)

Showing 1 - 20of30 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

poster
September 2010
Automatic vector instruction selection for dynamic compilation
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 573–574https://doi.org/10.1145/1854273.1854358

Accelerating program performance via short SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, and AltiVec SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the ...
3
279
Metrics
Total Citations3
Total Downloads279
Last 12 Months4
Last 6 weeks0
Get Access
poster
September 2010
A software-SVM-based transactional memory for multicore accelerator architectures with local memory
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 567–568https://doi.org/10.1145/1854273.1854355

We propose a software transactional memory (STM) for heterogeneous multicores with small local memory. The heterogeneous multicore architecture consists of a general-purpose processor element (GPE) and multiple accelerator processor elements (APEs). The ...
1
279
Metrics
Total Citations1
Total Downloads279
Last 12 Months4
Last 6 weeks0
Get Access
poster
September 2010
DMATiler: revisiting loop tiling for direct memory access
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 559–560https://doi.org/10.1145/1854273.1854351

In this paper we present the design and implementation of a DMATiler which combines compiler analysis and runtime management to optimize local memory performance. In traditional cache model based loop tiling optimizations, the compiler approximates ...
5
241
Metrics
Total Citations5
Total Downloads241
Last 12 Months4
Last 6 weeks0
Get Access
poster
September 2010
An integer programming framework for optimizing shared memory use on GPUs
- Wenjing Ma,
- Gagan Agrawal
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 553–554https://doi.org/10.1145/1854273.1854348

General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve ...
12
380
Metrics
Total Citations12
Total Downloads380
Last 12 Months5
Last 6 weeks1
Get Access
poster
September 2010
Analyzing cache performance bottlenecks of STM applications and addressing them with compiler's help
- Sandya S. Mannarswamy,
- R. Govindarajan
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 547–548https://doi.org/10.1145/1854273.1854345

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its ...
0
159
Metrics
Total Citations0
Total Downloads159
Last 12 Months4
Last 6 weeks0
Get Access
poster
September 2010
Improving speculative loop parallelization via selective squash and speculation reuse
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 543–544https://doi.org/10.1145/1854273.1854343

Speculative parallelization is a powerful technique to parallelize loops with irregular data dependencies. In this poster, we present a value-based selective squash protocol and an optimistic speculation reuse technique that leverages an extended notion ...
0
194
Metrics
Total Citations0
Total Downloads194
Last 12 Months5
Last 6 weeks0
Get Access
poster
September 2010
Ordered and unordered algorithms for parallel breadth first search
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 539–540https://doi.org/10.1145/1854273.1854341

We describe and evaluate ordered and unordered algorithms for shared-memory parallel breadth-first search. The unordered algorithm is based on viewing breadth-first search as a fixpoint computation, and in general, it may perform more work than the ...
5
324
Metrics
Total Citations5
Total Downloads324
Last 12 Months18
Last 6 weeks0
Get Access
poster
September 2010
Believe it or not!: mult-core CPUs can match GPU performance for a FLOP-intensive application!
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 537–538https://doi.org/10.1145/1854273.1854340

In this paper, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also ...
10
598
Metrics
Total Citations10
Total Downloads598
Last 12 Months7
Last 6 weeks1
Get Access
research-article
September 2010
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 523–534https://doi.org/10.1145/1854273.1854337

The prevalence of chip multiprocessor opens opportunities of running data-parallel applications originally in clusters on a single machine with many cores. MapReduce, a simple and elegant programming model to program large scale clusters, has recently ...
86
1,304
Metrics
Total Citations86
Total Downloads1,304
Last 12 Months18
Last 6 weeks1
Get Access
research-article
September 2010
Compiler-assisted data distribution for chip multiprocessors
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 501–512https://doi.org/10.1145/1854273.1854335

Data access latency, a limiting factor in the performance of chip multiprocessors, grows significantly with the number of cores in non-uniform cache architectures with distributed cache banks. To mitigate this effect, it is necessary to leverage the ...
33
408
Metrics
Total Citations33
Total Downloads408
Last 12 Months6
Last 6 weeks1
Get Access
research-article
September 2010
Using memory mapping to support cactus stacks in work-stealing runtime systems
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 411–420https://doi.org/10.1145/1854273.1854324
Many multithreaded concurrency platforms that use a work-stealing runtime system incorporate a "cactus stack," wherein a function's accesses to stack variables properly respect the function's calling ancestry, even when many of the functions operate in ...
33
367
Metrics
Total Citations33
Total Downloads367
Last 12 Months14
Last 6 weeks0
Get Access
research-article
September 2010
AM++: a generalized active message framework
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 401–410https://doi.org/10.1145/1854273.1854323

Active messages have proven to be an effective approach for certain communication problems in high performance computing. Many MPI implementations, as well as runtimes for Partitioned Global Address Space languages, use active messages in their low-...
55
417
Metrics
Total Citations55
Total Downloads417
Last 12 Months14
Last 6 weeks1
Get Access
research-article
September 2010
The Paralax infrastructure: automatic parallelization with a helping hand
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 389–400https://doi.org/10.1145/1854273.1854322

Speeding up sequential programs on multicores is a challenging problem that is in urgent need of a solution. Automatic parallelization of irregular pointer-intensive codes, exemplified by the SPECint codes, is a very hard problem. This paper shows that, ...
67
642
Metrics
Total Citations67
Total Downloads642
Last 12 Months17
Last 6 weeks2
Get Access
research-article
September 2010
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information
- Georgios Tournavitis,
- Björn Franke
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 377–388https://doi.org/10.1145/1854273.1854321

In recent years multi-core computer systems have left the realm of high-performance computing and virtually all of today's desktop computers and embedded computing systems are equipped with several processing cores. Still, no single parallel programming ...
39
805
Metrics
Total Citations39
Total Downloads805
Last 12 Months13
Last 6 weeks0
Get Access
research-article
September 2010
An empirical characterization of stream programs and its implications for language and compiler design
- William Thies,
- Saman Amarasinghe
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 365–376https://doi.org/10.1145/1854273.1854319

Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in ...
131
545
Metrics
Total Citations131
Total Downloads545
Last 12 Months9
Last 6 weeks1
Get Access
research-article
September 2010
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 353–364https://doi.org/10.1145/1854273.1854318

Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (...
167
1,076
Metrics
Total Citations167
Total Downloads1,076
Last 12 Months38
Last 6 weeks4
Get Access
research-article
September 2010
A model for fusion and code motion in an automatic parallelizing compiler
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 343–352https://doi.org/10.1145/1854273.1854317

Loop fusion has been studied extensively, but in a manner isolated from other transformations. This was mainly due to the lack of a powerful intermediate representation for application of compositions of high-level transformations. Fusion presents ...
43
420
Metrics
Total Citations43
Total Downloads420
Last 12 Months22
Last 6 weeks2
Get Access
research-article
September 2010
Partitioning streaming parallelism for multi-cores: a machine learning based approach
- Zheng Wang,
- Michael F.P. O'Boyle
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 307–318https://doi.org/10.1145/1854273.1854313

Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address ...
78
1,390
Metrics
Total Citations78
Total Downloads1,390
Last 12 Months19
Last 6 weeks2
Get Access
research-article
September 2010
Efficient sequential consistency using conditional fences
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 295–306https://doi.org/10.1145/1854273.1854312

Among the various memory consistency models, the sequential consistency (SC) model, in which memory operations appear to take place in the order specified by the program, is the most intuitive and enables programmers to reason about their parallel ...
54
626
Metrics
Total Citations54
Total Downloads626
Last 12 Months14
Last 6 weeks0
Get Access
research-article
September 2010
Discovering and understanding performance bottlenecks in transactional applications
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 285–294https://doi.org/10.1145/1854273.1854311

Many researchers have developed applications using transactionalmemory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose ...
28
332
Metrics
Total Citations28
Total Downloads332
Last 12 Months11
Last 6 weeks3
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Automatic vector instruction selection for dynamic compilation

A software-SVM-based transactional memory for multicore accelerator architectures with local memory

DMATiler: revisiting loop tiling for direct memory access

An integer programming framework for optimizing shared memory use on GPUs

Analyzing cache performance bottlenecks of STM applications and addressing them with compiler's help

Improving speculative loop parallelization via selective squash and speculation reuse

Ordered and unordered algorithms for parallel breadth first search

Believe it or not!: mult-core CPUs can match GPU performance for a FLOP-intensive application!

Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling

Compiler-assisted data distribution for chip multiprocessors

Using memory mapping to support cactus stacks in work-stealing runtime systems

AM++: a generalized active message framework

The Paralax infrastructure: automatic parallelization with a helping hand

Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

An empirical characterization of stream programs and its implications for language and compiler design

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

A model for fusion and code motion in an automatic parallelizing compiler

Partitioning streaming parallelism for multi-cores: a machine learning based approach

Efficient sequential consistency using conditional fences

Discovering and understanding performance bottlenecks in transactional applications