skip to main content
10.1145/2989081.2989089acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Analytical Study on Bandwidth Efficiency of Heterogeneous Memory Systems

Published: 03 October 2016 Publication History

Abstract

Heterogeneous memory systems integrate different memory technologies to balance design requirements such as bandwidth, capacity, and cost. Performance of these systems depends heavily on memory hierarchy organization, memory attributes, and application characteristics. In this paper, we present analytical bandwidth models for a range of heterogeneous memory systems composed of DRAM and non-volatile memory (NVM). Our models enable exploring heterogeneous memory systems with different organizations and attributes. Using the models, we study the bandwidth efficiency of heterogeneous memory systems to provide insights into the bandwidth bottlenecks of these systems under different application characteristics. Our analytical results highlight the importance of NVM read-write bandwidth asymmetry and DRAM-NVM bandwidth asymmetry in bandwidth efficiency. Specifically, in flat non-uniform memory access (NUMA) systems, the read bandwidth is maximized when a certain portion of bandwidth is delivered by DRAM and that portion depends on multiple factors including DRAM and NVM bandwidth attributes and application bandwidth characteristics. In DRAM-cache-based systems, when the hit rate is low, the impact of the DRAM cache organization on the read bandwidth is minimal. However, at higher hit rates and NVM bandwidths, the impact of the cache organization on sustained read bandwidth becomes pronounced.

References

[1]
"High bandwidth memory (HBM) DRAM JESD235," 2013. {Online}. Available: https://www.jedec.org/standards-documents/docs/jesd235
[2]
"Hybrid memory cube specification 2.0," 2014. {Online}. Available: http://hybridmemorycube.org/files/SiteDown-loads/HMC-30G-VSR_HMCC_Specification_Rev2.0_Public.pdf
[3]
N. Agarwal, D. Nellans, M. Stephenson, M. O'Connor, and S. W. Keckler, "Page placement strategies for GPUs within heterogeneous memory systems," in ASPLOS, 2015, pp. 607-- 618.
[4]
E. Bolotin, D. Nellans, O. Villa, M. O'Connor, A. Ramirez, and S. Keckler, "Designing efficient heterogeneous memory architectures," IEEE Micro, vol. 35, no. 4, pp. 60--68, July 2015.
[5]
D. Callahan, J. Cocke, and K. Kennedy, "Estimating interlock and improving balance for pipelined architectures," J. of Parallel and Distributed Computing, vol. 5, no. 4, pp. 334--358, 1988.
[6]
C. Chou, A. Jaleel, and M. Qureshi, "CAMEO: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache," in Micro, 2014, pp. 1--12.
[7]
C. Chou, A. Jaleel, and M. Qureshi, "BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches," in ISCA, 2015, pp. 198--210.
[8]
J. Jeddeloh and B. Keeth, "Hybrid memory cube new DRAM architecture increases density and performance," in Symp. on VLSI Technology (VLSIT), June 2012, pp. 87--88.
[9]
D. Jevdjic, G. Loh, C. Kaynak, and B. Falsafi, "Unison cache: A scalable and effective die-stacked DRAM cache," in MICRO, 2014, pp. 25--37.
[10]
D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked DRAM caches for servers: Hit ratio, latency, or bandwidth? Have it all with footprint cache," in ISCA, 2013, pp. 404--415.
[11]
G. Kim, J. Kim, J. H. Ahn, and J. Kim, "Memory-centric system interconnect design with hybrid memory cubes," in PACT, 2013, pp. 145--155.
[12]
Y. Lee, J. Kim, H. Jang, H. Yang, J. Kim, J. Jeong, and J. Lee, "A fully associative, tagless DRAM cache," in ISCA, 2015, pp. 211--222.
[13]
G. Loh and M. Hill, "Efficiently enabling conventional block sizes for very large die-stacked DRAM caches," in MICRO, 2011, pp. 454--464.
[14]
G. Loh and M. Hill, "Supporting very large DRAM caches with compound-access scheduling and MissMap," IEEE Micro, vol. 32, no. 3, pp. 70--78, May 2012.
[15]
J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hotchips, 2011.
[16]
M. Qureshi, M. Franceschini, A. Jagmohan, and L. Lastras, "PreSET: Improving performance of phase change memories by exploiting asymmetry in write times," in ISCA, 2012, pp. 380--391.
[17]
M. Qureshi and G. Loh, "Fundamental latency trade-off in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design," in MICRO, 2012, pp. 235--246.
[18]
M. Radulovic, D. Zivanovic, D. Ruiz, B. R. de Supinski, S. A. McKee, P. Radojkovic, and E. Ayguadé, "Another trip to the wall: How much will stacked DRAM benefit HPC?" in Proc. Intl. Symp. on Memory Systems (MEMSYS), 2015, pp. 31--36.
[19]
D. Roberts, A. Farmahini-Farahani, K. Cheng, N. Hu, D. May-hew, and M. Ignatowski, "NMI: A new memory interface to enable innovation," in Hotchips, 2015.
[20]
A. Sodani, "Knights Landing: 2nd generation Intel "Xeon Phi" processor," in Hotchips, 2015.
[21]
S. Williams, A. Waterman, and D. Patterson, "Roofline: An insightful visual performance model for multicore architectures," Commun. ACM, vol. 52, no. 4, pp. 65--76, Apr. 2009.
[22]
C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie, "Overcoming the challenges of crossbar resistive memory architectures," in HPCA, Feb 2015, pp. 476--488.
[23]
J. Yue and Y. Zhu, "Accelerating write by exploiting PCM asymmetries," in HPCA, Feb. 2013, pp. 282--293.
  1. Analytical Study on Bandwidth Efficiency of Heterogeneous Memory Systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
    October 2016
    463 pages
    ISBN:9781450343053
    DOI:10.1145/2989081
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MEMSYS '16

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 163
      Total Downloads
    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media