Article

Free access

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

Authors:

Babak FalsafiAuthors Info & Claims

SPAA '00: Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures

Pages 79 - 88

https://doi.org/10.1145/341800.341811

Published: 09 July 2000 Publication History

Abstract

In this paper, we compare and contrast two techniques to improve capacity/conflict miss traffic in CC-NUMA DSM clusters. Page migration/replication optimizes read-write accesses to a page used by a single processor by migrating the page to that processor and replicates all read-shared pages in the sharers' local memories. R-NUMA optimizes read-write accesses to any page by allowing a processor to cache that page in its main memory. Page migration/replication requires less hardware complexity as compared to R-NUMA, but has limited applicability and incurs much higher overheads even with tuned hardware/software support.

In this paper, we compare and contrast page migration/replication and R-NUMA on simulated clusters of symmetric multiprocessors executing shared-memory applications. Our results show that: (1) both page migration/replication and R-NUMA significantly improve the system performance over “first-touch” migration in many applications, (2) page migration/replication has limited opportunity and can not eliminate all the capacity/conflict misses even with fast hardware support and unlimited amount of memory, (3) R-NUMA always performs best given a page cache large enough to fit an application's primary working set and subsumes page migration/replication, (4) R-NUMA benefits more from hardware support to accelerate page operations than page migration/replication, and (5) integrating page migration/replication into R-NUMA to help reduce the hardware cost requires sophisticated mechanisms and policies to select candidates for page migration/replication.

References

[1]

Luiz Andre Barroso, Kourosh Gharachorloo, and Edouard Bugnion. Memory system characterization of commercial workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3-14, June 1998.

Digital Library

[2]

Tony M. Brewer. A highly scalable system using up to 128 PA-RISC processors. In Digest of Papers, COMP- CON'95, pages 133-144, March 1995.

Digital Library

[3]

R. Clark and K. Alnes. An SCI interconnect chipset and adapter. In Symposium Record, Hot Interconnects IV, August 1996.

[4]

Kattamuri Ekanadham, Beng-Hong Lim, Pratap Pattanaik, and Marc Snir. Prism: An integrated architecture for scalable shared memory. In Proceedings of the Fourth IEEE Symposium on High-Pelformance Computer Architecture, February 1998.

Digital Library

[5]

Babak Falsafi and David A. Wood. Reactive NUMA: A design for unifying S-COMA and CC-NUMA. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 229-240, June 1997.

Digital Library

[6]

Erik Hagersten and Michael Koster. WildFire: A scalable path for SMPs. In Proceedings of the Fifth IEEE Symposium on High-Performance Computer Architecture, pages 172-181, February 1999.

Digital Library

[7]

Erik Hagersten, Ashley Saulsbury, and Anders Landin. Simple COMA node implementations. In Proceedings of the 27th Hawaii International Conference on System Sciences, January 1994.

[8]

Chen-Chi Kuo, John Carter, Ravindra Kuramkote, and Mark Swanson. Ascoma: An adaptive hybrid shared memory architecture. In Proceedings of the 1998 International Conference on Parallel Processing, August 1998.

Digital Library

[9]

Rick LaRowe and Carla Ellis. Experimental comparison of memory management policies for numa multiprcessors. ACM Transactions on Computer Systems, 9(4):319- 363, November 1991.

Digital Library

[10]

James Laudon and Daniel Lenoski. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24th Annual International Symposium on Computer Architecture, May 1997.

Digital Library

[11]

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lain. The stanford DASH multiprocessor. IEEE Computer, 25(3):63-79, March 1992.

Digital Library

[12]

Tom Lovett and Russel Clapp. STING: A CC-NUMA compute system for the commercial marketplace. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.

Digital Library

[13]

Michael Marchetti, Leonidas Kontothanassis, Ricardo Bianchini, and Michael L. Scott. Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems. In Proceedings of the Nineth International Parallel Processing Symposium, April 1995.

Digital Library

[14]

Adrian Moga and Michel Dubois. The effectiveness of SRAM network caches in clustered DSMs. In Proceedings of the Fourth IEEE Symposium on High-Pelformance Computer Architecture, pages 103-112, February 1998.

Digital Library

[15]

Shubhendu S. Mukherjee, Steven K. Reinhardt, Babak Falsafi, Mike Litzkow, Steve Huss-Lederman, Mark D. Hill, James R. Larus, and David A. Wood. Wisconsin Wind Tunnel II: A fast and portable parallel architecture simulator. IEEE Concurrency, 2000. To appear.

Digital Library

[16]

A. Nowatzyk, M. Monger, M. Parkin, E. Kelly, M. Borwne, G. Aybay, and D. Lee. S3.mp: A multiprocessor in a matchbox. In Proc. PASA, 1993.

[17]

Vijayaraghavan Soundararajan, Mark Heinrich, Ben Verghese, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. Flexible use of memory for replication/ migration in cache-coherent DSM multiprocessors. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 342-355, June 1998.

Digital Library

[18]

Ben Verghese, Scott Devine, Anoop Gupta, and Mendel Rosenblum. Operating system support for improving data locality on cc-numa compute servers. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 279-289, October 1996.

Digital Library

[19]

Wolf-Dietrich Weber, Stephen Gold, Pat Helland, Takeshi Shimizu Thomas Wicki, and Winfried Wilcke. The Mercury interconnect architecture: A cost-effective infrastructure for high-performance servers. In Proceedings of the 24th Annual International Symposium on Computer Architecture, May 1997.

Digital Library

[20]

Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-36, July 1995.

Digital Library

[21]

Zheng Zhang and Josep Torrellas. Reducing remote conflict misses: Numa with remote cache versus coma. In Proceedings of the Third IEEE Symposium on High-Performance Computer Architecture, pages 272-281, February 1997.

Digital Library

Index Terms

Recommendations

Flexible Use of Memory for Replication/Migration inCache-Coherent DSM Multiprocessors
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Special Issue: Proceedings of the 25th annual international symposium on Computer architecture (ISCA '98)

Given the limitations of bus-based multiprocessors, CC-NUMA is the scalable architecture of choice for shared-memory machines. The most important characteristic of the CC-NUMA architecture is that the latency to access data on a remote node is ...
Reducing memory reference energy with opportunistic virtual caching
ISCA '12: Proceedings of the 39th Annual International Symposium on Computer Architecture

Most modern cores perform a highly-associative transaction look aside buffer (TLB) lookup on every memory access. These designs often hide the TLB lookup latency by overlapping it with L1 cache access, but this overlap does not hide the power dissipated ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '00: Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures

July 2000

224 pages

ISBN:1581131852

DOI:10.1145/341800

Chairmen:
Gary Miller
Carnegie Mellon Univ. Pittsburgh PA; and Akamai Technologies, Inc.
,
Shang-Hua Teng
Univ. of Illinois at Urbana-Champaign, Urbana

Copyright © 2000 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2000

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SPAA00

Sponsor:

SPAA00: ACM symposium on Parallel Algorithms and Architectures

July 9 - 13, 2000

Maine, Bar Harbor, USA

Acceptance Rates

SPAA '00 Paper Acceptance Rate 24 of 45 submissions, 53%;

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
375
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)14

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents