Hardware

Applied Filters

People

Publications

Conferences

Publication Date

18 Results for: Book/Issue: PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,802,176 records)|Limit your search to The ACM Full-Text Collection (771,782 records)

Showing 1 - 18of18 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

abstract
September 2012
Linearly compressed pages: a main memory compression framework with low complexity and low latency
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 489–490https://doi.org/10.1145/2370816.2370911
15
312
Metrics
Total Citations15
Total Downloads312
Last 12 Months10
Last 6 weeks4
Get Access
abstract
September 2012
Hardware prefetchers for emerging parallel applications
- Biswabandan Panda,
- Shankar Balachandran
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 485–486https://doi.org/10.1145/2370816.2370909

Hardware prefetching has been studied in the past for multi-programmed workloads as well as GPUs. Efficient hardware prefetchers like stream-based or GHB-based ones work well for multiprogrammed workloads because different programs get mapped to ...
2
451
Metrics
Total Citations2
Total Downloads451
Last 12 Months5
Last 6 weeks0
Get Access
abstract
September 2012
SkipCache: miss-rate aware cache management
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 481–482https://doi.org/10.1145/2370816.2370907

It is common for computers to have multi-level caches. This piece of work revolves around one question: Are all levels needed by all applications during all phases of their execution?, especially in the multi programmed scenario where giving the entire ...
2
424
Metrics
Total Citations2
Total Downloads424
Last 12 Months6
Last 6 weeks1
Get Access
abstract
September 2012
Design of a storage processing unit
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 479–480https://doi.org/10.1145/2370816.2370906

Researchers showed that performing computation directly on storage devices improves system performance in terms of energy consumption and processing time. For example, Riedel et al. [2] proposed an active disk which performs computation using the ...
1
260
Metrics
Total Citations1
Total Downloads260
Last 12 Months6
Last 6 weeks0
Get Access
poster
September 2012
Energy-efficient cache partitioning for future CMPs
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 465–466https://doi.org/10.1145/2370816.2370898

The demand for high performance computing systems requires processor vendors to increase the number of cores per chip multiprocessor (CMP). However, as their number grows, the core-to-way ratio in the last level cache (LLC) increases, presenting ...
3
316
Metrics
Total Citations3
Total Downloads316
Last 12 Months6
Last 6 weeks1
Get Access
poster
September 2012
PS-Dir: a scalable two-level directory cache
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 451–452https://doi.org/10.1145/2370816.2370891

As the number of cores increases in both incoming and future chip multiprocessors, coherence protocols must address novel hardware structures in order to scale in terms of performance, power, and area. It is well known that most blocks accessed by ...
10
140
Metrics
Total Citations10
Total Downloads140
Last 12 Months8
Last 6 weeks0
Get Access
poster
September 2012
Off-chip access localization for NoC-based multicores
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 447–448https://doi.org/10.1145/2370816.2370889

In a network-on-chip based multicore, an off-chip data access needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access itself). Further, it also causes additional delays for on-...
1
135
Metrics
Total Citations1
Total Downloads135
Last 12 Months4
Last 6 weeks0
Get Access
research-article
September 2012
Base-delta-immediate compression: practical data compression for on-chip caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 377–388https://doi.org/10.1145/2370816.2370870

Cache compression is a promising technique to increase on-chip cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high ...
319
1,318
Metrics
Total Citations319
Total Downloads1,318
Last 12 Months153
Last 6 weeks18
Get Access
research-article
September 2012
A software memory partition approach for eliminating bank-level interference in multicore systems
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 367–376https://doi.org/10.1145/2370816.2370869

Main memory system is a shared resource in modern multicore machines, resulting in serious interference, which causes performance degradation in terms of throughput slowdown and unfairness. Numerous new memory scheduling algorithms have been proposed to ...
210
1,055
Metrics
Total Citations210
Total Downloads1,055
Last 12 Months51
Last 6 weeks16
Get Access
research-article
September 2012
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 355–366https://doi.org/10.1145/2370816.2370868

Off-chip main memory has long been a bottleneck for system performance. With increasing memory pressure due to multiple on-chip cores, effective cache utilization is important. In a system with limited cache space, we would ideally like to prevent 1) ...
92
678
Metrics
Total Citations92
Total Downloads678
Last 12 Months69
Last 6 weeks7
Get Access
research-article
September 2012
Optimal bypass monitor for high performance last-level caches
- Lingda Li,
- Dong Tong,
- Zichao Xie,
- Junlin Lu,
- Xu Cheng
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 315–324https://doi.org/10.1145/2370816.2370862

In the last-level cache, large amounts of blocks have reuse distances greater than the available cache capacity. Cache performance and efficiency can be improved if some subset of these distant reuse blocks can reside in the cache longer. The bypass ...
31
476
Metrics
Total Citations31
Total Downloads476
Last 12 Months32
Last 6 weeks1
Get Access
research-article
September 2012
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 293–304https://doi.org/10.1145/2370816.2370860

The replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...
52
620
Metrics
Total Citations52
Total Downloads620
Last 12 Months29
Last 6 weeks2
Get Access
research-article
September 2012
Shared memory multiplexing: a novel way to improve GPGPU throughput
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 283–292https://doi.org/10.1145/2370816.2370858

On-chip shared memory (a.k.a. local data share) is a critical resource to many GPGPU applications. In current GPUs, the shared memory is allocated when a thread block (also called a workgroup) is dispatched to a streaming multiprocessor (SM) and is ...
52
519
Metrics
Total Citations52
Total Downloads519
Last 12 Months27
Last 6 weeks4
Get Access
research-article
September 2012
Complexity-effective multicore coherence
- Alberto Ros,
- Stefanos Kaxiras
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 241–252https://doi.org/10.1145/2370816.2370853

Much of the complexity and overhead (directory, state bits, invalidations) of a typical directory coherence implementation stems from the effort to make it "invisible" even to the strongest memory consistency model. In this paper, we show that a much ...
119
655
Metrics
Total Citations119
Total Downloads655
Last 12 Months39
Last 6 weeks3
Get Access
research-article
September 2012
RISE: improving the streaming processors reliability against soft errors in gpgpus
- Jingweijia Tan,
- Xin Fu
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 191–200https://doi.org/10.1145/2370816.2370846

With hundreds of cores integrated into a single chip, the general-purpose computing on graphic processing units (GPGPUs) provide high computing power to accelerate parallel applications. However, they are prone to manifest high soft-error vulnerability ...
48
372
Metrics
Total Citations48
Total Downloads372
Last 12 Months20
Last 6 weeks5
Get Access
research-article
September 2012
Making data prefetch smarter: adaptive prefetching on POWER7
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 137–146https://doi.org/10.1145/2370816.2370837

Hardware data prefetch engines are integral parts of many general purpose server-class microprocessors in the field today. Some prefetch engines allow the user to change some of their parameters. The prefetcher, however, is usually enabled in a default ...
51
877
Metrics
Total Citations51
Total Downloads877
Last 12 Months24
Last 6 weeks1
Get Access
research-article
September 2012
Optimizing datacenter power with memory system levers for guaranteed quality-of-service
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 117–126https://doi.org/10.1145/2370816.2370834

Co-location of applications is a proven technique to improve hardware utilization. Recent advances in virtualization have made co-location of independent applications on shared hardware a common scenario in datacenters. Co-location, while maintaining ...
7
189
Metrics
Total Citations7
Total Downloads189
Last 12 Months11
Last 6 weeks1
Get Access
research-article
September 2012
XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 75–86https://doi.org/10.1145/2370816.2370829

With multi-core processors now mainstream, the shift to many-core processors poses a new set of design challenges. In particular, the scalability of coherence protocols remains a significant challenge. While complex Network-on-Chip interconnect fabrics ...
3
369
Metrics
Total Citations3
Total Downloads369
Last 12 Months5
Last 6 weeks1
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Linearly compressed pages: a main memory compression framework with low complexity and low latency

Hardware prefetchers for emerging parallel applications

SkipCache: miss-rate aware cache management

Design of a storage processing unit

Energy-efficient cache partitioning for future CMPs

PS-Dir: a scalable two-level directory cache

Off-chip access localization for NoC-based multicores

Base-delta-immediate compression: practical data compression for on-chip caches

A software memory partition approach for eliminating bank-level interference in multicore systems

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

Optimal bypass monitor for high performance last-level caches

Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches

Shared memory multiplexing: a novel way to improve GPGPU throughput

Complexity-effective multicore coherence

RISE: improving the streaming processors reliability against soft errors in gpgpus

Making data prefetch smarter: adaptive prefetching on POWER7

Optimizing datacenter power with memory system levers for guaranteed quality-of-service

XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems