skip to main content
10.1145/139669.139705acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Comparative performance evaluation of cache-coherent NUMA and COMA architectures

Published: 01 April 1992 Publication History

Abstract

Two interesting variations of large-scale shared-memory machines that have recently emerged are cache-coherent non-uniform-memory-access machines (CC-NUMA) and cache-only memory architectures (COMA). They both have distributed main memory and use directory-based cache coherence. Unlike CC-NUMA, however, COMA machines automatically migrate and replicate data at the main-memory level in cache-line sized chunks. This paper compares the performance of these two classes of machines. We first present a qualitative model that shows that the relative performance is primarily determined by two factors: the relative magnitude of capacity misses versus coherence misses, and the granularity of data partitions in the application. We then present quantitative results using simulation studies for eight parallel applications (including all six applications from the SPLASH benchmark suite). We show that COMA's potential for performance improvement is limited to applications where data accesses by different processors are finely interleaved in memory space and, in addition, where capacity misses dominate over coherence misses. In other situations, for example where coherence misses dominate, COMA can actually perform worse than CC-NUMA due to increased miss latencies caused by its hierarchical directories. Finally, we propose a new architectural alternative, called COMA-F, that combines the advantages of both CC-NUMA and COMA.

References

[1]
Anant Agarwal, Beng-Hong Lira, David Kranz, and John Kubiatowicz. APRIL: A processor architecture for multiprocessing. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 104-114, May 1990.
[2]
David L. Black, Anoop Gupta, and Wolf-Dietrich Weber. Competitive management of distributed shared memory, in Proceedings of Compcon 1989, March 1989.
[3]
William J. Bolosky, Michael L, Scott, Robert P. Fitzgerald, Robert J. Fowler, and Alan L. Cox. NUMA policies and their relation to memory architecture. In Proceedings of the 4th bzternational Conerfernce on Architectural Support for Programming Languages and Operating Systems, pages 212-221, 1991.
[4]
Henry Burkhardt Ill, Steven Frank, Bruce Knobe, and James Rotlmie. Overview of the KSR1 Computer System. Technical Report KSR-TR-9202001, Kendall Square Research, Boston, February 1992.
[5]
Helen Davis, Stephen R. Goldsehrnidt, and John L. Henneg~y_ Multiprocessor simulation and tracing using Tango. in Proceedings of International Conference on Parallel Processing, pages 99-107, 1991. Vol. II.
[6]
Michel Dubois, Christoph Scheurich, and Faye Briggs. Memory access buffering in multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 434-442, 1986.
[7]
Alan Gottlieb, Ralph Grishman, C. Kruskal, Kevin McAuliffe, Larry Rudolph, and Mark Snir. The NYU Ultracomputer- Designing a MIMD, shared memory parallel machine. IEEE Transactions on Computers, 32(2):175-189, February 1983.
[8]
Anoop Gupta, Truman Joe, and Per StenstrOm. Comparative performance evaluation of cache-coherent NUMA arid COMA architectures. Technical report, Stanford Universily, March 1992.
[9]
Anoop Gupta, Wolf-Dietrich Weber, and Todd Mowry. Reducing memory and traffic requirements for scalable directorybased cache coherence schemes. In Proceedings of International Conference on Parallel Processing, August 1990.
[10]
Erik Hagersten, Self Haridi, and David H.D. Warren. The cache-coherence protocol of the data diffusion machine, in Michel Dubois and Shreekant Thakkar, editors, Cache and Interconnect Architectures in Multiprocessors. Kluwer Academic Publishers, 1990.
[11]
Monica S. Lain, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the 4th International Conerfernce on Architectural Support for Programming Languages and Operating Systems, pages 63-74, 1991.
[12]
Daniel E. Lenoski, James P. Laudon, Kourosh Gharachorloo, Anoop Gupta, and John L. Hennessy. The direclory-based cache coherence protocol for the DASH rnultiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, 1990.
[13]
Jaswinder P. Singh, Chris Holt, Takashi Totsuka, Anoop Gupta, and John L. Hermessy. Load balancing and data locality in parallel hierarchial N-body simulation. Technical Report CSL- TR-92-505, Stanford University, February 1992.
[14]
Jaswinder P. Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford parallel applications for shared-memory. Technical Report CSL-TR-91-469, Stanford University, April 1991.
[15]
Joseph Torrellas, Monica S. Lam, and John L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the International Conference on Parallel Processing, pages 266-270, 1990. Vol. II.
[16]
Deborah A. Wallach. A scalable hierarchial cache coherence protocol. Bachelor of Science Thesis, Massachuesetts Institute of Technology, May 1990.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture
May 1992
439 pages
ISBN:0897915097
DOI:10.1145/139669
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 20, Issue 2
    Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)
    May 1992
    429 pages
    ISSN:0163-5964
    DOI:10.1145/146628
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1992

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA92
Sponsor:
ISCA92: International Conference on Computer Architecture
May 19 - 21, 1992
Queensland, Australia

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)158
  • Downloads (Last 6 weeks)16
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media