SIGOPS: Vol 32, No 5

Volume 32, Issue 5Dec. 1998

Volume 32, Issue 5

Dec. 1998

Editor:

William M. Waite
Univ. of Colorado, Boulder

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:0163-5980

Tags:

Bibliometrics

Newsletter Downloads

PDFFrontmatter

PDFBackmatter

Select All

Export Citations Save to Binder

article

Free

Compiler-controlled memory

Pages 2–11https://doi.org/10.1145/384265.291010

Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler's ability to improve memory ...

article

Free

Segregating heap objects by reference behavior and lifetime

Pages 12–23https://doi.org/10.1145/384265.291012

Dynamic storage allocation has become increasingly important in many applications, in part due to the use of the object-oriented paradigm. At the same time, processor speeds are increasing faster than memory speeds and programs are increasing in size ...

article

Free

Schedule-independent storage mapping for loops

Pages 24–33https://doi.org/10.1145/384265.291015

This paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related ...

article

Free

An empirical analysis of instruction repetition

Pages 35–45https://doi.org/10.1145/384265.291016

We study the phenomenon of instruction repetition, where the inputs and outputs of multiple dynamic instances of a static instruction are repeated. We observe that over 80% of the dynamic instructions executed in several programs are repeated and most ...

article

Free

Space-time scheduling of instruction-level parallelism on a raw machine

Pages 46–57https://doi.org/10.1145/384265.291018

Increasing demand for both greater parallelism and faster clocks dictate that future generation architectures will need to decentralize their resources and eliminate primitives that require single cycle global communication. A Raw microprocessor ...

article

Free

Data speculation support for a chip multiprocessor

Pages 58–69https://doi.org/10.1145/384265.291020

Thread-level speculation is a technique that enables parallel execution of sequential applications on a multiprocessor. This paper describes the complete implementation of the support for threadlevel speculation on the Hydra chip multiprocessor (CMP). ...

article

Free

VISA: Netstation's virtual Internet SCSI adapter

Pages 71–80https://doi.org/10.1145/384265.291023

In this paper we describe the implementation of VISA, our Virtual Internet SCSI Adapter. VISA was built to evaluate the performance impact on the host operating system of using IP to communicate with peripherals, especially storage devices. We have ...

article

Free

Active disks: programming model, algorithms and evaluation

Pages 81–91https://doi.org/10.1145/384265.291026

Several application and technology trends indicate that it might be both profitable and feasible to move computation closer to the data that it processes. In this paper, we evaluate Active Disk architectures which integrate significant processing power ...

article

Free

A cost-effective, high-bandwidth storage architecture

Pages 92–103https://doi.org/10.1145/384265.291029

This paper describes the Network-Attached Secure Disk (NASD) storage architecture, prototype implementations oj NASD drives, array management for our architecture, and three, filesystems built on our prototype. NASD provides scalable storage bandwidth ...

article

Free

Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Pages 105–114https://doi.org/10.1145/384265.291032

The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of ...

article

Free

Dependence based prefetching for linked data structures

Pages 115–126https://doi.org/10.1145/384265.291034

We introduce a dynamic scheme that captures the accesspat-terns of linked data structures and can be used to predict future accesses with high accuracy. Our technique exploits the dependence relationships that exist between loads that produce addresses ...

article

Free

Performance counters and state sharing annotations: a unified approach to thread locality

Boris Weissman

Pages 127–138https://doi.org/10.1145/384265.291035

This paper describes a combined approach for improving thread locality that uses the bardware performance monitors of modem processors and program-centric code annotations to guide thread scheduling on SMPs. The approach relies on a shared state cache ...

article

Free

Cache-conscious data placement

Pages 139–149https://doi.org/10.1145/384265.291036

As the gap between memory and processor speeds continues to widen, cache eficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache pet$ormance by mapping code with temporal ...

article

Free

An out-of-order execution technique for runtime binary translators

Bich C. Le

Pages 151–158https://doi.org/10.1145/384265.291039

A dynamic translator emulates an instruction set architccturc by translating source instructions to native code during execution. On statically-scheduled hardware, higher performance can potentially be achieved by reordering the translated instructions; ...

article

Free

Overlapping execution with transfer using non-strict execution for mobile programs

Pages 159–169https://doi.org/10.1145/384265.291040

In order to execute a program on a remote computer, it mustfirst be transferred over a network. This transmission incurs the over-head of network latency before execution can begin. This latency can vary greatly depending upon the size of the program., ...

article

Free

Variable length path branch prediction

Pages 170–179https://doi.org/10.1145/384265.291042

Accurate branch prediction is required to achieve high performance in deeply pipelined, wide-issue processors. Recent studies have shown that conditional and indirect (or computed) branch targets can be accuratelypredicted by recording the path, which ...

article

Free

Performance isolation: sharing and isolation in shared-memory multiprocessors

Pages 181–192https://doi.org/10.1145/384265.291044

Shared-memory multiprocessors (SMPs) are being extensively used as general-purpose servers. The tight coupling of multiple processors, memory, and I/O provides enormous computing power in a single system, and enables the efficient sharing of these ...

article

Free

UTLB: a mechanism for address translation on network interfaces

Pages 193–204https://doi.org/10.1145/384265.291046

An important aspect of a high-speed network system is the ability to transfer data directly between the network interface and application buffers. Such a direct data path requires the network interface to "know" the virtual-to-physical address ...

article

Free

Locality-aware request distribution in cluster-based network servers

Pages 205–216https://doi.org/10.1145/384265.291048

We consider cluster-based network servers in which a front-end directs incoming requests to one of a number of back-ends. Specifically, we consider content-based request distribution: the front-end uses the content requested, in addition to information ...

article

Free

Investigating optimal local memory performance

Olivier Temam

Pages 218–227https://doi.org/10.1145/384265.291050

Recent work has demonstrated that, cache space is often poorly utilized. However, no previous work has yet demonstrated upper bounds on what a cache or local memory could achieve when exploiting both spatial and temporal locality. Belady's MIN algorithm ...

article

Free

Precise miss analysis for program transformations with caches of arbitrary associativity

Pages 228–239https://doi.org/10.1145/384265.291051

Analyzing and optimizing program memory performance is a pressing problem in high-performance computer architectures. Currently, software solutions addressing the processor-memory performance gap include compiler-or programmer-applied optimizations like ...

article

Free

Capturing dynamic memory reference behavior with adaptive cache topology

Pages 240–250https://doi.org/10.1145/384265.291053

Memory references exhibit locality and are therefore not uniformly distributed across the sets of a cache. This skew reduces the effectiveness of a cache because it results in the caching of a considerable number of less-recently-used lines which are ...

article

Free

Accelerating multi-media processing by implementing memoing in multiplication and division units

Pages 252–261https://doi.org/10.1145/384265.291056

This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root …) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations ...

article

Free

Value speculation scheduling for high performance processors

Pages 262–271https://doi.org/10.1145/384265.291058

Recent research in value prediction shows a surprising amount of predictability for the values produced by register-writing instructions. Several hardware based value predictor designs have been proposed to exploit this predictability by eliminating ...

article

Free

An empirical study of decentralized ILP execution models

Pages 272–281https://doi.org/10.1145/384265.291061

Recent fascination for dynamic scheduling as a means for exploiting instruction-level parallelism has introduced significant interest in the scalability aspects of dynamic scheduling hardware. In order to overcome the scalability problems of centralized ...

article

Free

Fast out-of-order processor simulation using memoization

Pages 283–294https://doi.org/10.1145/384265.291063

Our new out-of-order processor simulatol; FastSim, uses two innovations to speed up simulation 8--15 times (vs. Wisconsin SimpleScalar) with no loss in simulation accuracy. First, FastSim uses speculative direct-execution to accelerate the functional ...

article

Free

A look at several memory management units, TLB-refill mechanisms, and page table organizations

Pages 295–306https://doi.org/10.1145/384265.291065

Virtual memory is a staple in modem systems, though there is little agreement on how its functionality is to be implemented on either the hardware or software side of the interface. The myriad of design choices and incompatible hardware mechanisms ...

article

Free

Performance of database workloads on shared-memory systems with out-of-order processors

Pages 307–318https://doi.org/10.1145/384265.291067

Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized ...

Sections

Newsletter Downloads

Save to Binder

Subjects

Comments