Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2009
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor
- Shailender Chaudhry,
- Robert Cypher,
- Magnus Ekman,
- Martin Karlsson,
- Anders Landin,
- Sherman Yip,
- Håkan Zeffer,
- Marc Tremblay
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 484–495https://doi.org/10.1145/1555754.1555814This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
- Carlos Madriles,
- Pedro López,
- Josep M. Codina,
- Enric Gibert,
- Fernando Latorre,
- Alejandro Martinez,
- Raúl Martinez,
- Antonio Gonzalez
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 474–483https://doi.org/10.1145/1555754.1555813Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Dynamic performance tuning for speculative threads
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 462–473https://doi.org/10.1145/1555754.1555812In response to the emergence of multicore processors, various novel and sophisticated execution models have been introduced to fully utilize these processors. One such execution model is Thread-Level Speculation (TLS), which allows potentially dependent ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Achieving predictable performance through better memory controller placement in many-core CMPs
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 451–461https://doi.org/10.1145/1555754.1555810In the near term, Moore's law will continue to provide an increasing number of transistors and therefore an increasing number of on-chip cores. Limited pin bandwidth prevents the integration of a large number of memory controllers on-chip. With many ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Phastlane: a rapid transit optical routing network
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 441–450https://doi.org/10.1145/1555754.1555809Tens and eventually hundreds of processing cores are projected to be integrated onto future microprocessors, making the global interconnect a key component to achieving scalable chip performance within a given power envelope. While CMOS-compatible ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 -
- research-articleJune 2009
Firefly: illuminating future network-on-chip with nanophotonics
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 429–440https://doi.org/10.1145/1555754.1555808Future many-core processors will require high-performance yet energy-efficient on-chip networks to provide a communication substrate for the increasing number of cores. Recent advances in silicon nanophotonics create new opportunities for on-chip ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Flexible reference-counting-based hardware acceleration for garbage collection
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 418–428https://doi.org/10.1145/1555754.1555806Languages featuring automatic memory management (garbage collection) are increasingly used to write all kinds of applications because they provide clear software engineering and security advantages. Unfortunately, garbage collection imposes a toll on ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Performance and power of cache-based reconfigurable computing
- Andrew Putnam,
- Susan Eggers,
- Dave Bennett,
- Eric Dellinger,
- Jeff Mason,
- Henry Styles,
- Prasanna Sundararajan,
- Ralph Wittig
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 395–405https://doi.org/10.1145/1555754.1555804Many-cache is a memory architecture that efficiently supports caching in commercially available FPGAs. It facilitates FPGA programming for high-performance computing (HPC) developers by providing them with memory performance that is greater and power ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
A fault tolerant, area efficient architecture for Shor's factoring algorithm
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 383–394https://doi.org/10.1145/1555754.1555802We optimize the area and latency of Shor's factoring while simultaneously improving fault tolerance through: (1) balancing the use of ancilla generators, (2) aggressive optimization of error correction, and (3) tuning the core adder circuits. Our custom ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Thread motion: fine-grained power management for multi-core systems
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 302–313https://doi.org/10.1145/1555754.1555793Dynamic voltage and frequency scaling (DVFS) is a commonly-used power-management scheme that dynamically adjusts power and performance to the time-varying needs of running programs. Unfortunately, conventional DVFS, relying on off-chip regulators, faces ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 290–301https://doi.org/10.1145/1555754.1555792With the shift towards chip multiprocessors (CMPs), exploiting and managing parallelism has become a central problem in computing systems. Many issues of parallelism management boil down to discerning which running threads or processes are critical, or ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 279–289https://doi.org/10.1145/1555754.1555790As their prices decline, their storage capacities increase, and their endurance improves, NAND Flash Solid State Disks (SSD) provide an increasingly attractive alternative to Hard Disk Drives (HDD) for portable computing systems and PCs. This paper ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Disaggregated memory for expansion and sharing in blade servers
- Kevin Lim,
- Jichuan Chang,
- Trevor Mudge,
- Parthasarathy Ranganathan,
- Steven K. Reinhardt,
- Thomas F. Wenisch
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 267–278https://doi.org/10.1145/1555754.1555789Analysis of technology and application trends reveals a growing imbalance in the peak compute-to-memory-capacity ratio for future servers. At the same time, the fraction contributed by memory systems to total datacenter costs and power consumption ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Decoupled store completion/silent deterministic replay: enabling scalable data memory for CPR/CFP processors
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 245–254https://doi.org/10.1145/1555754.1555786CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction window that scales to tolerate last-level cache misses. CPR/CFP scale the register file by aggressively reclaiming the destination registers of many in-...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
InvisiFence: performance-transparent memory ordering in conventional multiprocessors
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 233–244https://doi.org/10.1145/1555754.1555785A multiprocessor's memory consistency model imposes ordering constraints among loads, stores, atomic operations, and memory fences. Even for consistency models that relax ordering among loads and stores, ordering constraints still induce significant ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
A case for bufferless routing in on-chip networks
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 196–207https://doi.org/10.1145/1555754.1555781Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 174–183https://doi.org/10.1145/1555754.1555778Many multi-core processors employ a large last-level cache (LLC) shared among the multiple cores. Past research has demonstrated that sharing-oblivious cache management policies (e.g., LRU) can lead to poor performance and fairness when the multiple ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 152–163https://doi.org/10.1145/1555754.1555775GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
- John H. Kelm,
- Daniel R. Johnson,
- Matthew R. Johnson,
- Neal C. Crago,
- William Tuohy,
- Aqeel Mahesri,
- Steven S. Lumetta,
- Matthew I. Frank,
- Sanjay J. Patel
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 140–151https://doi.org/10.1145/1555754.1555774This paper considers Rigel, a programmable accelerator architecture for a broad class of data- and task-parallel computation. Rigel comprises 1000+ hierarchically-organized cores that use a fine-grained, dynamically scheduled single-program, multiple-...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3 - research-articleJune 2009
AnySP: anytime anywhere anyway signal processing
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecturePages 128–139https://doi.org/10.1145/1555754.1555773In the past decade, the proliferation of mobile devices has increased at a spectacular rate. There are now more than 3.3 billion active cell phones in the world-a device that we now all depend on in our daily lives. The current generation of devices ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 37 Issue 3