Low-power, low-storage-overhead chipkill correct via multi-line error correction

X Jian, H Duwe, J Sartori, V Sridharan… - Proceedings of the …, 2013 - dl.acm.org
Proceedings of the International Conference on High Performance Computing …, 2013dl.acm.org
Due to their large memory capacities, many modern servers require chipkill correct, an
advanced type of memory error detection and correction, to meet their reliability
requirements. However, existing chipkill-correct solutions incur high power or storage
overheads, or both because they use dedicated error-correction resources per codeword to
perform error correction. This requires high overhead for correction and results in high
overhead for error detection. We propose a novel chipkill-correct solution, multi-line error …
Due to their large memory capacities, many modern servers require chipkill correct, an advanced type of memory error detection and correction, to meet their reliability requirements. However, existing chipkill-correct solutions incur high power or storage overheads, or both because they use dedicated error-correction resources per codeword to perform error correction. This requires high overhead for correction and results in high overhead for error detection. We propose a novel chipkill-correct solution, multi-line error correction, that uses resources shared across multiple lines in memory for error correction to reduce the overhead of both error detection and correction. Our evaluations show that the proposed solution reduces memory power by a mean of 27%, and up to 38% with respect to commercial solutions, at a cost of 0.4% increase in storage overhead and minimal impact on reliability.
ACM Digital Library