skip to main content
10.5555/645610.662032guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors

Published: 15 April 2002 Publication History

Abstract

Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller, the coherence hardware and the network interface/router. In this work we exploit such integration scale, presenting a novel node architecture aimed at reducing the long L2 miss latencies and the memory overhead of using directories that characterize cc-NUMA machines and limit their scalability. Our proposal replaces the traditional directory with a novel threelevel directory architecture and adds a small shared data cache to each of the nodes of a multiprocessor system. Due to their small size, the first-level directory and the shared data cache are integrated into the processor chip in every node. A taxonomy of the L2 misses, according to the actions performed by the directory to satisfy them is also presented. Using execution-driven simulations, we show significant L2 miss latency reductions (more than 60% in some cases). These important improvements translate into reductions of more than 30% in the application execution time in some cases.

References

[1]
M. E. Acacio, J. González, J. M. García and J. Duato. "A New Scalable Directory Architecture for Large-Scale Multiprocessors". 7th Int'l Symposium on High Performance Computer Architecture, Jan. 2001.
[2]
M. E. Acacio, J. González, J. M. García and J. Duato. "A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors". Tech. Report UM-DITEC-2002- 1, Computer Engineering Department, University of Murcia, Jan. 2002.
[3]
L. A. Barroso, K. Gharachorloo and E. Bugnion. "Memory System Characterization of Commercial Workloads". 25th Int'l Symposium on Computer Architecture, June 1998.
[4]
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzky, S. Qadeer, B. Sano, S. Smith, R. Stets and B. Verghese. "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing". 27th Int'l Symposium on Computer Architecture, June 2000.
[5]
D. E. Culler, J. P. Singh and A. Gupta. "Parallel Computer Architecture: A Hardware/Software Approach". Morgan Kaufmann Publishers, Inc., 1999.
[6]
A. Gupta, W.-D. Weber and T. Mowry. "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes". Int'l Conference on Parallel Processing, August 1990.
[7]
L. Gwennap. "Alpha 21364 to Ease Memory Bottleneck". Microprocessor Report, pp. 12-15, October 1998.
[8]
L. Hammond, M. Willey and K. Olukotun. "The Standford Hydra CMP". Proc. of Hot Chips 11, August 1999.
[9]
J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum and J. Hennessy. "The Stanford FLASH Multiprocessor". 21st Int'l Symposium on Computer Architecture, Apr. 1994.
[10]
J. Laudon and D. Lenoski. "The SGI Origin: A ccNUMA Highly Scalable Server". 24th Int'l Symposium on Computer Architecture, June 1997.
[11]
T. Lovett and R. Clapp. "STiNG: A CC-NUMA Computer System for the Commercial Marketplace". 23rd Int'l Symposium on Computer Architecture, 1997.
[12]
M. M. Michael and A. K. Nanda. "Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors". 5th Int'l Symposium on High Performance Computer Architecture, Jan. 1999.
[13]
B. O'Krafka and A. Newton. "An Empirical Evaluation of Two Memory-Efficient Directory Methods". 17th Int'l Symposium on Computer Architecture, May 1990.
[14]
V. Pai, P. Ranganathan and S. Adve. "RSIM Reference Manual version 1.0". Tech. Report 9705, Department of Electrical and Computer Engineering, Rice University, Aug. 1997.
[15]
J. Torrellas, L. Yang and A. T. Nguyen. "Toward A Cost-Effective DSM Organization That Exploits Processor-Memory Integration". 6th Int'l Symposium on High Performance Computer Architecture, Jan. 2000.
[16]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh and A. Gupta. "The SPLASH-2 Programs: Characterization and Methodological Considerations". 22nd Int'l Symposium on Computer Architecture, June 1995.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium
April 2002
ISBN:0769515738

Publisher

IEEE Computer Society

United States

Publication History

Published: 15 April 2002

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media