skip to main content
article

Sleepy-LRU: extending the lifetime of non-volatile caches by reducing activity of age bits

Published: 01 July 2019 Publication History

Abstract

Emerging non-volatile memories (NVMs) are known as promising alternatives to SRAMs in on-chip caches. However, their limited write endurance is a major challenge when NVMs are employed in these highly frequently written caches. Early wear-out of NVM cells makes the lifetime of the caches extremely insufficient for nowadays computational systems. Previous studies only addressed the lifetime of data part in the cache. This paper first demonstrates that the age bits field of the cache replacement algorithm is the most frequently written part of a cache block and its lifetime is shorter than that of data part by more than 27$$\times$$ . Second, it investigates the effect of age bits wear-out on the cache operation and shows that the performance is severely degraded after even a small portion of age bits become non-operational. Third, a novel cache replacement algorithm, so-called Sleepy-LRU, is proposed to reduce the write activity of the age bits with negligible overheads. The evaluations show that Sleepy-LRU extends the lifetime of instruction and data caches to 3.63$$\times$$ and 3.00$$\times$$, respectively, with an average of 0.06% performance overhead. In addition, Sleepy-LRU imposes no area and power consumption overhead.

References

[1]
Ahn J (2013) Selectively protecting error-correcting code for area-efficient and reliable STT-RAM caches. In: Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC), pp 285---290
[2]
Ahn J, Yoo S, Choi K (2012) Lower-bits cache for low power STT-RAM caches. In: Proceedings of the International Symposium on Circuits and Systems (ISCAS), pp 480---483
[3]
Ahn J, Yoo S, Choi K (2016) Prediction hybrid cache: an energy-efficient STT-RAM cache architecture. IEEE Trans Comput (TC) 65(3):940---951
[4]
Asadi S, Hosseini Monazzah AM, Farbeh H, Miremadi SG (2017) WIPE: wearout informed pattern elimination to improve the endurance of NVM-based caches. In: Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC), pp 10---15
[5]
Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, pp 72---81
[6]
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1---7
[7]
Chang YM, Hsiu PC, Chang YH, Chen CH, Kuo TW, Wang CYM (2016) Improving PCM endurance with a constant-cost wear leveling design. ACM Trans Des Autom Electron Syst (TODAES) 22(1):9:1---9:27
[8]
Chen X, Khoshavi N, DeMara RF, Wang J, Huang D, Wen W, Chen Y (2017) Energy-aware adaptive restore schemes for MLC STT-RAM cache. IEEE Trans Comput (TC) 65(3):786---798
[9]
Cheng HY, Poremba M, Shahidi N, Stalev I, Irwin MJ, Kandemir M, Sampson J, Xie Y (2015) Eecache: a comprehensive study on the architectural design for energy-efficient last-level caches in chip multiprocessors. ACM Trans Archit Code Optim (TACO) 12(2):17:1---17:22
[10]
Cheshmikhani E, Farbeh H, Miremadi SG, Asadi H (2018) TA-LRW: a replacement policy for error rate reduction in STT-MRAM caches. IEEE Trans Comput. https://doi.org/10.1109/TC.2018.2875439
[11]
Cheshmikhani E, Hosseini Monazzah AM, Farbeh H, Miremadi SG (2016) Investigating the effects of process variations and system workloads on reliability of STT-RAM caches. In: Proceedings of the European Dependable Computing Conference (EDCC), pp 120---129
[12]
Cho S, Lee H (2009) Flip-N-write: a simple deterministic technique to improve PRAM write performance, energy and endurance. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 347---357
[13]
Dhiman G, Ayoub RZ, Rosing T (2009) PDRAM: a hybrid PRAM and DRAM main memory system. In: Proceedings of the Design Automation Conference (DAC), pp 664---469
[14]
Duan G, Wang S (2014) Exploiting narrow-width values for improving non-volatile cache lifetime. In: Proceedings of the Conference on Design, Automation and Test in Europe (DATE), pp 52:1---52:4
[15]
Farbeh H, Hyeonggyu K, Miremadi SG, Kim S (2016) Floating-ECC: dynamic repositioning of error correcting code bits for extending the lifetime of STT-RAM caches. IEEE Trans Comput (TC) 65(12):3661---3675
[16]
Farbeh H, Miremadi SG (2014) PSP-cache: alow-cost fault-tolerant cache memory architecture. In: Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, p 164
[17]
Farbeh H, Monazzah AMH, Aliagha E, Cheshmikhani E (2018) A-cache: alternating cache allocation to conduct higher endurance in NVM-based caches. IEEE Trans Circuits Syst II Express Briefs PP(99):1---5
[18]
Farbeh H, Mozafari F, Zabihi M, Miremadi SG (2017) Raw-tag: replicating in altered cache ways for correcting multiple-bit errors in tag array. IEEE Trans Depend Secure Comput. https://doi.org/10.1109/TDSC.2017.2706263
[19]
Fernandez-Pascual R, Ros A, Acacio ME (2017) To be silent or not: on the impact of evictions of clean data in cache-coherent multicores. J Supercomput 73(10):4428---4443
[20]
Ghaemi SG, Hosseini Monazzah AM, Farbeh H, Miremadi SG (2015) LATED: lifetime-aware tag for enduring design. In: Proceedings of the IEEE International European Dependable Computing Conference (EDCC), pp 97---107
[21]
Henning JL (2006) SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput Archit News 34(4):1---17
[22]
Hijaz F, Shi Q, Kurian G, Devadas S, Khan O (2016) Locality-aware data replication in the last-level cache for large scale multicores. J Supercomput 72(2):718---752
[23]
Hong S, Lee J, Kim S (2014) Ternary cache: three-valued MLC STT-RAM caches. In: Proceedings of the IEEE International Conference on Computer Design (ICCD), pp 83---89
[24]
Jadidi A, Arjomand M, Sarbazi-Azad H (2011) High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement. In: Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED), pp 79---84
[25]
Jaleel A, Theobald KB, Steely Jr SC, Emer J (2010) High performance cache replacement using re-reference interval prediction (RRIP). In: ACM SIGARCH Computer Architecture News, vol 38. ACM, pp 60---71
[26]
Jokar MR, Arjomand M, Sarbazi-Azad H (2016) Sequoia: a high-endurance NVM-based cache architecture. IEEE Trans Very Large Scale Integr (VLSI) Syst 24(3):954---967
[27]
Joo Y, Niu D, Dong X, Sun G, Chang N, Xie Y (2010) Energy- and endurance-aware design of phase change memory caches. In: Proceedings of the Conference on Design, Automation and Test in Europe (DATE), pp 136---141
[28]
Joo Y, Park S (2013) A hybrid PRAM and SST-RAM cache architecture for extending lifetime of PRAM caches. IEEE Comput Archit Lett (CAL) 12(2):55---58
[29]
Kang SH (2010) Embedded STT-MRAM for mobile applications: enabling advanced chip architectures. In: Non-Valotile Memories Workshop, UCSD
[30]
Lin C, Chiou JN (2015) High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans Very Large Scale Integr (VLSI) Syst 23(10):2149---2161
[31]
Lin IC, Chiou JN (2015) High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans Very Large Scale Integr Syst (TVLSI) 23(10):2149---2161
[32]
Mittal S, Vetter JS (2015) AYUSH: a technique for extending lifetime of SRAM---NVM hybrid caches. IEEE Comput Archit Lett (CAL) 14(2):115---118
[33]
Mittal S, Vetter JS, Li D (2014) LastingNVCache: a technique for improving the lifetime of non-volatile caches. In: Proceedings of the International Symposium on VLSI (ISVLSI), pp 534---540
[34]
Monazzah AMH, Farbeh H, Miremadi SG (2017) Investigating the effects of process variations and system workloads on endurance of non-volatile caches. In: 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, pp 1---6
[35]
Monazzah AMH, Farbeh H, Miremadi SG (2017) Optimas: overwrite purging through in-execution memory address snooping to improve lifetime of NVM-based scratchpad memories. IEEE Trans Device Mater Reliab 17(3):481---489
[36]
Qureshi MK, Jaleel A, Patt YN, Steely SC, Emer J (2007) Adaptive insertion policies for high performance caching. In: ACM SIGARCH Computer Architecture News, vol 35. ACM, pp 381---391
[37]
Ramtake D, Kumarl S (2018) Performance analysis of first level cache memory replacement policies in multicore systems. Int J Eng Res Comput Sci Eng 5:505---511
[38]
Sun Z, Bi X, Wu W, Yoo S, Li HH (2016) Array organization and data management exploration in racetrack memory. IEEE Trans Comput (TC) 65(4):1041---1054
[39]
Sundriyal V, Sosonkina M (2016) Joint frequency scaling of processor and DRAM. J Supercomput 72(4):1549---1569
[40]
UltraSPARC T (2006) Supplement to the ultrasparc architecture 2007
[41]
Wang J, Dong X, Xie Y (2014) Preventing STT-RAM last-level caches from port obstruction. ACM Trans Archit Code Optim (TACO) 11(3):23:1---23:19
[42]
Wang J, Dong X, Xie Y, Jouppi NP (2013) i$^2$WAP: improving non-volatile cache lifetime by reducing inter- and intra-set write variations. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 234---245
[43]
Wang J, Dong X, Xie Y, Jouppi NP (2014) Endurance-aware cache line management for non-volatile caches. ACM Trans Archit Code Optim (TACO) 11(1):4:1---4:25
[44]
Wang S, Duan G, Li Y, Dong Q (2017) Word- and partition-level write variation reduction for improving non-volatile cache lifetime. ACM Trans Des Autom Electron Syst (TODAES) 23(1):4:1---4:18
[45]
Wen W, Zhang Y, Chen Y, Wang Y, Xie Y (2012) PS3-RAM: a fast portable and scalable statistical STT-RAM reliability analysis method. In: Proceedings of the Design Automation Conference (DAC), pp 1191---1196
[46]
Wu CJ, Jaleel A, Hasenplaugh W, Martonosi M, Steely Jr SC, Emer J (2011) Ship: signature-based hit predictor for high performance caching. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, pp 430---441
[47]
Yan K, Peng L, Chen M, Fu X (2017) Exploring energy-efficient cache design in emerging mobile platforms. ACM Trans Des Autom Electron Syst (TODAES) 22(4):58:1---58:20
[48]
Yazdanshenas S, Ranjbar Pirbast M, Fazeli M, Patooghy A (2013) Coding last level STT-RAM cache for high endurance and low power. IEEE Comput Archit Lett (CAL) 13(2):73---76
[49]
Young V, Chen C, Jaleel A, Qureshi M (2017) Ship++: enhancing signature-based hit predictor for improved cache performance. In: Proceedings of the Cache Replacement Championship (CRC17) Held in Conjunction with the International Symposium on Computer Architecture (ISCA17)
[50]
Zhou P, Zhao B, Yang J, Zhang Y (2009) Energy reduction for STT-RAM using early write termination. In: Proceedings of the International Conference on Computer-Aided Design (ICCAD), pp 264---268

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 75, Issue 7
July 2019
628 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2019

Author Tags

  1. Age bits
  2. Lifetime
  3. Non-volatile caches
  4. Replacement algorithm
  5. Write endurance

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media