research-article

Analytical Study on Bandwidth Efficiency of Heterogeneous Memory Systems

Authors:

Amin Farmahini-Farahani,

Nuwan JayasenaAuthors Info & Claims

MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

Pages 104 - 118

https://doi.org/10.1145/2989081.2989089

Published: 03 October 2016 Publication History

Abstract

Heterogeneous memory systems integrate different memory technologies to balance design requirements such as bandwidth, capacity, and cost. Performance of these systems depends heavily on memory hierarchy organization, memory attributes, and application characteristics. In this paper, we present analytical bandwidth models for a range of heterogeneous memory systems composed of DRAM and non-volatile memory (NVM). Our models enable exploring heterogeneous memory systems with different organizations and attributes. Using the models, we study the bandwidth efficiency of heterogeneous memory systems to provide insights into the bandwidth bottlenecks of these systems under different application characteristics. Our analytical results highlight the importance of NVM read-write bandwidth asymmetry and DRAM-NVM bandwidth asymmetry in bandwidth efficiency. Specifically, in flat non-uniform memory access (NUMA) systems, the read bandwidth is maximized when a certain portion of bandwidth is delivered by DRAM and that portion depends on multiple factors including DRAM and NVM bandwidth attributes and application bandwidth characteristics. In DRAM-cache-based systems, when the hit rate is low, the impact of the DRAM cache organization on the read bandwidth is minimal. However, at higher hit rates and NVM bandwidths, the impact of the cache organization on sustained read bandwidth becomes pronounced.

References

[1]

"High bandwidth memory (HBM) DRAM JESD235," 2013. {Online}. Available: https://www.jedec.org/standards-documents/docs/jesd235

[2]

"Hybrid memory cube specification 2.0," 2014. {Online}. Available: http://hybridmemorycube.org/files/SiteDown-loads/HMC-30G-VSR_HMCC_Specification_Rev2.0_Public.pdf

[3]

N. Agarwal, D. Nellans, M. Stephenson, M. O'Connor, and S. W. Keckler, "Page placement strategies for GPUs within heterogeneous memory systems," in ASPLOS, 2015, pp. 607-- 618.

Digital Library

[4]

E. Bolotin, D. Nellans, O. Villa, M. O'Connor, A. Ramirez, and S. Keckler, "Designing efficient heterogeneous memory architectures," IEEE Micro, vol. 35, no. 4, pp. 60--68, July 2015.

[5]

D. Callahan, J. Cocke, and K. Kennedy, "Estimating interlock and improving balance for pipelined architectures," J. of Parallel and Distributed Computing, vol. 5, no. 4, pp. 334--358, 1988.

Digital Library

[6]

C. Chou, A. Jaleel, and M. Qureshi, "CAMEO: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache," in Micro, 2014, pp. 1--12.

Digital Library

[7]

C. Chou, A. Jaleel, and M. Qureshi, "BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches," in ISCA, 2015, pp. 198--210.

Digital Library

[8]

J. Jeddeloh and B. Keeth, "Hybrid memory cube new DRAM architecture increases density and performance," in Symp. on VLSI Technology (VLSIT), June 2012, pp. 87--88.

[9]

D. Jevdjic, G. Loh, C. Kaynak, and B. Falsafi, "Unison cache: A scalable and effective die-stacked DRAM cache," in MICRO, 2014, pp. 25--37.

Digital Library

[10]

D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked DRAM caches for servers: Hit ratio, latency, or bandwidth? Have it all with footprint cache," in ISCA, 2013, pp. 404--415.

Digital Library

[11]

G. Kim, J. Kim, J. H. Ahn, and J. Kim, "Memory-centric system interconnect design with hybrid memory cubes," in PACT, 2013, pp. 145--155.

Digital Library

[12]

Y. Lee, J. Kim, H. Jang, H. Yang, J. Kim, J. Jeong, and J. Lee, "A fully associative, tagless DRAM cache," in ISCA, 2015, pp. 211--222.

Digital Library

[13]

G. Loh and M. Hill, "Efficiently enabling conventional block sizes for very large die-stacked DRAM caches," in MICRO, 2011, pp. 454--464.

Digital Library

[14]

G. Loh and M. Hill, "Supporting very large DRAM caches with compound-access scheduling and MissMap," IEEE Micro, vol. 32, no. 3, pp. 70--78, May 2012.

Digital Library

[15]

J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hotchips, 2011.

[16]

M. Qureshi, M. Franceschini, A. Jagmohan, and L. Lastras, "PreSET: Improving performance of phase change memories by exploiting asymmetry in write times," in ISCA, 2012, pp. 380--391.

Digital Library

[17]

M. Qureshi and G. Loh, "Fundamental latency trade-off in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design," in MICRO, 2012, pp. 235--246.

Digital Library

[18]

M. Radulovic, D. Zivanovic, D. Ruiz, B. R. de Supinski, S. A. McKee, P. Radojkovic, and E. Ayguadé, "Another trip to the wall: How much will stacked DRAM benefit HPC?" in Proc. Intl. Symp. on Memory Systems (MEMSYS), 2015, pp. 31--36.

Digital Library

[19]

D. Roberts, A. Farmahini-Farahani, K. Cheng, N. Hu, D. May-hew, and M. Ignatowski, "NMI: A new memory interface to enable innovation," in Hotchips, 2015.

[20]

A. Sodani, "Knights Landing: 2nd generation Intel "Xeon Phi" processor," in Hotchips, 2015.

[21]

S. Williams, A. Waterman, and D. Patterson, "Roofline: An insightful visual performance model for multicore architectures," Commun. ACM, vol. 52, no. 4, pp. 65--76, Apr. 2009.

Digital Library

[22]

C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie, "Overcoming the challenges of crossbar resistive memory architectures," in HPCA, Feb 2015, pp. 476--488.

[23]

J. Yue and Y. Zhu, "Accelerating write by exploiting PCM asymmetries," in HPCA, Feb. 2013, pp. 282--293.

Digital Library

Analytical Study on Bandwidth Efficiency of Heterogeneous Memory Systems
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Design and implementation of bandwidth-aware memory placement and migration policies for heterogeneous memory systems
ICS '17: Proceedings of the International Conference on Supercomputing

Heterogeneous memory systems that comprise memory nodes based on widely-different device technologies (e.g., DRAM and nonvolatile memory (NVM)) are emerging in various computing domains ranging from high-performance to embedded computing. Despite the ...
Data tiering in heterogeneous memory systems
EuroSys '16: Proceedings of the Eleventh European Conference on Computer Systems

Memory-based data center applications require increasingly large memory capacities, but face the challenges posed by the inherent difficulties in scaling DRAM and also the cost of DRAM. Future systems are attempting to address these demands with ...
Adaptive Bandwidth Management for Performance-Temperature Trade-offs in Heterogeneous HMC+DDRx Memory
GLSVLSI '15: Proceedings of the 25th edition on Great Lakes Symposium on VLSI

High fabrication cost per bit and thermal issues are the main reasons that prevent architects from using 3D-DRAM alone as the main memory. In this paper we address this issue by proposing a heterogeneous memory system that combines a DDRx DRAM with an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

October 2016

463 pages

ISBN:9781450343053

DOI:10.1145/2989081

General Chair:
Bruce Jacob
University of Maryland

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

MEMSYS '16

MEMSYS '16: The Second International Symposium on Memory Systems

October 3 - 6, 2016

VA, Alexandria, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
163
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten