skip to main content
Volume 18, Issue 2SIJune 1990Special Issue: Proceedings of the 17th annual international symposium on Computer Architecture
Reflects downloads up to 06 Jan 2025Bibliometrics
article
Free
Weak ordering—a new definition

A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An ...

article
Free
Memory consistency and event ordering in scalable shared-memory multiprocessors

Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge ...

article
Free
Synchronization with multiprocessor caches

Introducing private caches in bus-based shared memory multiprocessors leads to the cache consistency problem since there may be multiple copies of shared data. However, the ability to snoop on the bus coupled with the fast broadcast capability allows ...

article
Free
Dynamic processor allocation in hypercube computers

Fully recognizing various subcubes in a hypercube computer efficiently is nontrivial due to the specific structure of the hypercube. We propose a method with much less complexity than the multiple-GC strategy in generating the search space, while ...

article
Free
A new approach to fast control of r2× r2 3-stage benes networks of r×r crossbar switches

The routing control of Benes networks has proven to be costly. This paper introduces a new approach to fast control of N × N 3-stage Benes networks of r × r crossbar switches as building blocks, where N = r2 and r ≥ 2. The new approach consists of ...

article
Free
Virtual-channel flow control

Network throughput can be increased by dividing the buffer storage associated with each network channel into several virtual channels [DalSei]. Each physical channel is associated with several small queues, virtual channels, rather than a single deep ...

article
Free
Supporting systolic and memory communication in iWarp

iWarp is a parallel architecture developed jointly by Carnegie Mellon University and Intel Corporation. The iWarp communication system supports two widely used interprocessor communication styles: memory communication and systolic communication. This ...

article
Free
Monsoon: an explicit token-store architecture

Dataflow architectures tolerate long unpredictable communication delays and support generation and coordination of parallel activities directly in hardware, rather than assuming that program mapping will cause these issues to disappear. However, the ...

article
Free
The K2 parallel processor: architecture and hardware implementation

K2 is a distributed-memory parallel processor designed to support a multi-user, multi-tasking, time-sharing operating system and an automatically parallelizing FORTRAN compiler. This paper presents the architecture and the hardware implementation of K2, ...

article
Free
APRIL: a processor architecture for multiprocessing

Processors in large-scale multiprocessors must be able to tolerate large communication latencies and synchronization delays. This paper describes the architecture of a rapid-context-switching processor called APRIL with support for fine-grain threads ...

article
Free
PLUS: a distributed shared-memory system

PLUS is a multiprocessor architecture tailored to the fast execution of a single multithreaded process; its goal is to accelerate the execution of CPU-bound applications. PLUS supports shared memory and efficient synchronization. Memory access latency ...

article
Free
Adaptive software cache management for distributed shared memory architectures

An adaptive cache coherence mechanism exploits semantic information about the expected or observed access behavior of particular data objects. We contend that, in distributed shared memory systems, adaptive cache coherence mechanisms will outperform ...

article
Free
Big science versus little science—do you have to build it? (panel session)

Research can be called big science if projects have numerous researchers, large funding, significant infrastructure, and plans to build complex tools or prototypes. Most experimental physicists practice big science, as do computer architects who build ...

article
Free
An empirical evaluation of two memory-efficient directory methods

This paper presents an empirical evaluation of two memory-efficient directory methods for maintaining coherent caches in large shared memory multiprocessors. Both directory methods are modifications of a scheme proposed by Censier and Feautrier [5] that ...

article
Free
The directory-based cache coherence protocol for the DASH multiprocessor

DASH is a scalable shared-memory multiprocessor currently being developed at Stanford's Computer Systems Laboratory. The architecture consists of powerful processing nodes, each with a portion of the shared-memory, connected to a scalable ...

article
Free
The performance impact of block sizes and fetch strategies

This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance. It has been previously noted that given a simple fetch strategy the performance optimal block ...

article
Free
Performance comparison of load/store and symmetric instruction set architectures

Is it true that a Load/Store architecture is both simpler and faster than a Symmetric architecture, or does the Symmetric architecture offer a potential performance advantage that can be realized by the use of additional hardware?

In order to answer it ...

article
Free
Reducing the cost of branches by using registers

In an attempt to reduce the number of operand memory references, many RISC machines have thirty-two or more general-purpose registers (e.g., MIPS, ARM, Spectrum, 88K). Without special compiler optimizations, such as inlining or interprocedural register ...

article
Free
VAX vector architecture

The VAX Architecture has been extended to include an integrated, register-based vector processor. This extension allows both high-end and low-end implementations and can be supported with only small changes by VAX/VMS and VAX/ULTRIX operating systems. ...

article
Free
Multiple instruction issue in the NonStop cyclone processor

This paper describes the architecture for issuing multiple instructions per clock in the NonStop Cyclone Processor. Pairs of instructions are fetched and decoded by a dual two-stage prefetch pipeline and passed to a dual six-stage pipeline for ...

article
Free
Performance of an OLTP application on symmetry multiprocessor system

Sequent's Symmetry Series is a bus-based shared-memory multiprocessor. System performance in an OLTP relational database application was investigated using the TP1 benchmark. System performance was tested with fully-cached benchmarks and with scaled ...

article
Free
The impact of synchronization and granularity on parallel systems

In this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot of parallelism at the fine grain level, ...

article
Free
Trace-driven simulations for a two-level cache design in open bus systems

Two-level cache hierarchies will be a design issue in future high-performance CPUs. In this paper we evaluate various metrics for data cache* designs. We discuss both one- and two-level cache hierarchies. Our target is a new 100+ mips CPU, but the ...

article
Free
Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer

This paper presents the performance evaluation, workload characterization and trace driven simulation of a hypercube multi-computer running realistic workloads. Six representative parallel applications were selected as benchmarks. Software monitoring ...

article
Free
Generation and analysis of very long address traces

Existing methods of generating and analyzing traces suffer from a variety of limitations including complexity, inaccuracy, short length, inflexibility, or applicability only to CISC machines. We use a trace generation mechanism based on link-time code ...

article
Free
Fast Prolog with an extended general purpose architecture

Most Prolog machines have been based on specialized architectures. Our goal is to start with a general purpose architecture and determine a minimal set of extensions for high performance Prolog execution. We have developed both the architecture and ...

article
Free
Architectural support for the management of tightly-coupled fine-grain goals in flat concurrent Prolog

We propose architectural support for goal management as part of a special-purpose processor architecture for the efficient execution of Flat Concurrent Prolog. Goal management operations: halt, spawn, suspend and commit are decoupled from goal reduction,...

article
Free
Balance in architectural design

We introduce a performance metric, normalized time, which is closely related to such measures as the area-time product of VLSI theory and the price / performance ratio of advertising literature. This metric captures the idea of a piece of hardware “...

article
Free
A study of I/O behavior of perfect benchmarks on a multiprocessor

The I/O behavior of some scientific applications, a subset of Perfect benchmarks, executing on a multiprocessor is studied. The aim of this study is to explore the various patterns of I/O access of large scientific applications and to understand the ...

Subjects

Comments