skip to main content
10.1145/341800.341811acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article
Free access

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

Published: 09 July 2000 Publication History

Abstract

In this paper, we compare and contrast two techniques to improve capacity/conflict miss traffic in CC-NUMA DSM clusters. Page migration/replication optimizes read-write accesses to a page used by a single processor by migrating the page to that processor and replicates all read-shared pages in the sharers' local memories. R-NUMA optimizes read-write accesses to any page by allowing a processor to cache that page in its main memory. Page migration/replication requires less hardware complexity as compared to R-NUMA, but has limited applicability and incurs much higher overheads even with tuned hardware/software support.
In this paper, we compare and contrast page migration/replication and R-NUMA on simulated clusters of symmetric multiprocessors executing shared-memory applications. Our results show that: (1) both page migration/replication and R-NUMA significantly improve the system performance over “first-touch” migration in many applications, (2) page migration/replication has limited opportunity and can not eliminate all the capacity/conflict misses even with fast hardware support and unlimited amount of memory, (3) R-NUMA always performs best given a page cache large enough to fit an application's primary working set and subsumes page migration/replication, (4) R-NUMA benefits more from hardware support to accelerate page operations than page migration/replication, and (5) integrating page migration/replication into R-NUMA to help reduce the hardware cost requires sophisticated mechanisms and policies to select candidates for page migration/replication.

References

[1]
Luiz Andre Barroso, Kourosh Gharachorloo, and Edouard Bugnion. Memory system characterization of commercial workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3-14, June 1998.
[2]
Tony M. Brewer. A highly scalable system using up to 128 PA-RISC processors. In Digest of Papers, COMP- CON'95, pages 133-144, March 1995.
[3]
R. Clark and K. Alnes. An SCI interconnect chipset and adapter. In Symposium Record, Hot Interconnects IV, August 1996.
[4]
Kattamuri Ekanadham, Beng-Hong Lim, Pratap Pattanaik, and Marc Snir. Prism: An integrated architecture for scalable shared memory. In Proceedings of the Fourth IEEE Symposium on High-Pelformance Computer Architecture, February 1998.
[5]
Babak Falsafi and David A. Wood. Reactive NUMA: A design for unifying S-COMA and CC-NUMA. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 229-240, June 1997.
[6]
Erik Hagersten and Michael Koster. WildFire: A scalable path for SMPs. In Proceedings of the Fifth IEEE Symposium on High-Performance Computer Architecture, pages 172-181, February 1999.
[7]
Erik Hagersten, Ashley Saulsbury, and Anders Landin. Simple COMA node implementations. In Proceedings of the 27th Hawaii International Conference on System Sciences, January 1994.
[8]
Chen-Chi Kuo, John Carter, Ravindra Kuramkote, and Mark Swanson. Ascoma: An adaptive hybrid shared memory architecture. In Proceedings of the 1998 International Conference on Parallel Processing, August 1998.
[9]
Rick LaRowe and Carla Ellis. Experimental comparison of memory management policies for numa multiprcessors. ACM Transactions on Computer Systems, 9(4):319- 363, November 1991.
[10]
James Laudon and Daniel Lenoski. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24th Annual International Symposium on Computer Architecture, May 1997.
[11]
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lain. The stanford DASH multiprocessor. IEEE Computer, 25(3):63-79, March 1992.
[12]
Tom Lovett and Russel Clapp. STING: A CC-NUMA compute system for the commercial marketplace. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
[13]
Michael Marchetti, Leonidas Kontothanassis, Ricardo Bianchini, and Michael L. Scott. Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems. In Proceedings of the Nineth International Parallel Processing Symposium, April 1995.
[14]
Adrian Moga and Michel Dubois. The effectiveness of SRAM network caches in clustered DSMs. In Proceedings of the Fourth IEEE Symposium on High-Pelformance Computer Architecture, pages 103-112, February 1998.
[15]
Shubhendu S. Mukherjee, Steven K. Reinhardt, Babak Falsafi, Mike Litzkow, Steve Huss-Lederman, Mark D. Hill, James R. Larus, and David A. Wood. Wisconsin Wind Tunnel II: A fast and portable parallel architecture simulator. IEEE Concurrency, 2000. To appear.
[16]
A. Nowatzyk, M. Monger, M. Parkin, E. Kelly, M. Borwne, G. Aybay, and D. Lee. S3.mp: A multiprocessor in a matchbox. In Proc. PASA, 1993.
[17]
Vijayaraghavan Soundararajan, Mark Heinrich, Ben Verghese, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. Flexible use of memory for replication/ migration in cache-coherent DSM multiprocessors. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 342-355, June 1998.
[18]
Ben Verghese, Scott Devine, Anoop Gupta, and Mendel Rosenblum. Operating system support for improving data locality on cc-numa compute servers. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 279-289, October 1996.
[19]
Wolf-Dietrich Weber, Stephen Gold, Pat Helland, Takeshi Shimizu Thomas Wicki, and Winfried Wilcke. The Mercury interconnect architecture: A cost-effective infrastructure for high-performance servers. In Proceedings of the 24th Annual International Symposium on Computer Architecture, May 1997.
[20]
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-36, July 1995.
[21]
Zheng Zhang and Josep Torrellas. Reducing remote conflict misses: Numa with remote cache versus coma. In Proceedings of the Third IEEE Symposium on High-Performance Computer Architecture, pages 272-281, February 1997.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '00: Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
July 2000
224 pages
ISBN:1581131852
DOI:10.1145/341800
  • Chairmen:
  • Gary Miller,
  • Shang-Hua Teng
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2000

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SPAA00

Acceptance Rates

SPAA '00 Paper Acceptance Rate 24 of 45 submissions, 53%;
Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 375
    Total Downloads
  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)14
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media