skip to main content
10.1007/978-3-030-38991-8_9guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Solution for High Availability Memory Access

Published: 09 December 2019 Publication History

Abstract

Nowadays, in-memory computing has plenty of applications like artificial intelligence, databases, machine learning, etc. These applications usually involve with the frequent access to memory. On the other hand, memory components typically become error-prone over time due to the increase of density and capacity. It is urgently important to develop solutions for high-availability memory access. Yet, existing solutions are either lack of flexibility, or consistently more expensive than native memory. To the end, this paper presents a solution called SC2M. It is a software-controlled, high-availability memory mirroring solution. Our solution can flexibly set the granularity of the memory areas for various levels. Furthermore, it can perform duplication of the user-defined data structures in a high-availability version. The systematic instruction-level granularity for memory duplication reduces the overheads for backup, and lowers the probability of data loss. Experiment results demonstrate the feasibility and superiorities of our solution.

References

[1]
ACME Laboratories. memcached-ahigh-performance, distributed memory object caching system (2018). http://www.memcached.org/about
[2]
Cha, S., et al.: Defect analysis and cost-effective resilience architecture for future dram devices. In: HPCA, pp. 61–72 (2017)
[3]
Chen C and Hsiao MYB Error-correcting codes for semiconductor memory applications: a state-of-the-art review IBM J. Res. Dev. 1984 28 2 124-134
[4]
Chisnall D The Definitive Guide to the Xen Hypervisor 2008 London Pearson Education
[5]
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: high availability via asynchronous virtual machine replication. In: NSDI, pp. 161–174 (2008)
[6]
Deegan, J., Gower, K.: High reliability memory subsystem using data error correcting code symbol sliced command repowering. US Patent App. 10/723,055 (2005)
[7]
Dell, T.J.: Ecc-on-simm test challenges. In: ITC, pp. 511–515 (1994)
[8]
Vlasenko, D.: BusyBox: The Swiss Army Knife of Embedded Linux (2013). http://www.busybox.net/
[9]
Dong, H., et al.: Memvisor: application level memory mirroring via binary translation. In: CLUSTER, pp. 562–565 (2012)
[10]
Dong Y, Chen Y, Pan Z, Dai J, and Jiang Y Renic: architectural extension to sr-iov i/o virtualization for efficient replication ACM Trans. Archit. Code Optim. 2012 8 4 40:1-40:22
[11]
Ferraro-Petrillo U, Grandoni F, and Italiano GFData structures resilient to memory faults: an experimental study of dictionariesACM J. Exp. Algorithmics2013181-630611311322.68061
[12]
Fiala David, Ferreira Kurt B., Mueller Frank, and Engelmann Christian A Tunable, Software-Based DRAM Error Detection and Correction Library for HPC Euro-Par 2011: Parallel Processing Workshops 2012 Berlin, Heidelberg Springer Berlin Heidelberg 251-261
[14]
Jeong J, Kim H, Hwang J, Lee J, and Maeng S Rigorous rental memory management for embedded systems ACM Trans. Embed. Comput. Syst. 2013 12 1s 43:1-43:21
[15]
Khan, S., Paul, D., Momtahan, P., Aloqaily, M.: Artificial intelligence framework for smart city microgrids: state of the art, challenges, and opportunities. In: FMEC, pp. 283–288 (2018)
[16]
Levine L and Myers W Special feature: semiconductor memory reliability with error detecting and correcting codes IEEE Comput. 1976 9 10 43-50
[17]
Li, J., Zhao, M., Ju, L., Xue, C.J., Jia, Z.: Maximizing forward progress with cache-aware backup for self-powered non-volatile processors. In: DAC, pp. 1–6 (2017)
[18]
Liu, H., Xu, C.-Z., Jin, H., Gong, J., Liao, X.: Performance and energy modeling for live migration of virtual machines. In: HPDC, pp. 171–182 (2011)
[19]
Malek, A., Vasilakis, E., Papaefstathiou, V., Trancoso, P., Sourdis, I.: Odd-ECC: On-demand dram error correcting codes. In: MEMSYS, pp. 96–111 (2017)
[20]
Mappouras, G., Vahid, A., Calderbank, R., Hower, D.R., Sorin, D.J.: Jenga: efficient fault tolerance for stacked dram. In: ICCD, pp. 361–368 (2017)
[21]
[22]
Qi Z, Dong H, Sun W, Dong Y, and Guan H Multi-granularity memory mirroring via binary translation in cloud environments IEEE Trans. Netw. Serv. Manage. 2014 11 1 36-45
[23]
Redis: redis - an open source, BSD licensed, advanced key-value cache and store (2014). http://www.redis.io/
[24]
Reis, G.A., Chang, J., Vachharajani, N., Rangan, R., August, D.I.: Swift: software implemented fault tolerance. In: CGO, pp. 243–254 (2005)
[25]
Reis GA, Chang J, Vachharajani N, Rangan R, August DI, and Mukherjee SS Design and evaluation of hybrid fault-detection systems SIGARCH Comput. Archit. News 2005 33 2 148-159
[27]
Sridharan, V., Liberty, D.: A study of dram failures in the field. In: SC, pp. 1–11 (2012)
[28]
Stuart S, Loh GH, Karin S, and Doug B Use ECP, not ECC, for hard failures in resistive memories SIGARCH Comput. Archit. News 2010 38 3 1-12
[29]
Tang X, Zhai J, Yu B, Chen W, Zheng W, and Li K An efficient in-memory checkpoint method and its practice on fault-tolerant hpl IEEE Trans. Parallel Distrib. Syst. 2018 29 4 758-771
[30]
Wang, B., Qi, Z., Guan, H., Dong, H., Sun, W., Dong, Y.: kMemvisor: flexible system wide memory mirroring in virtual environments. In: HPDC, pp. 251–262 (2013)
[31]
Wang B, Qi Z, Ma R, Guan H, and Vasilakos AV A survey on data center networking for cloud computing Comput. Netw. 2015 91 528-547
[33]
Ye, K., Liu, Y., Xu, G., Xu, C.-Z.: Fault injection and detection for artificial intelligence applications in container-based clouds. In: CludCom, pp. 112–127 (2018)
[34]
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: SOSP, pp. 423–438 (2013)
[35]
Zheng H, Zhu Z, Dong X, Chen B, and Liu C Studying shadow page cache to improve isolated drivers’ performance Concurr. Comput. Pract. Exp. 2017 29 10 e4081

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Algorithms and Architectures for Parallel Processing: 19th International Conference, ICA3PP 2019, Melbourne, VIC, Australia, December 9–11, 2019, Proceedings, Part I
Dec 2019
724 pages
ISBN:978-3-030-38990-1
DOI:10.1007/978-3-030-38991-8
  • Editors:
  • Sheng Wen,
  • Albert Zomaya,
  • Laurence T. Yang

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 09 December 2019

Author Tags

  1. High availability
  2. Hardware virtualization
  3. System architecutre

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media