ACM Home Page
Please provide us with feedback. Feedback
Memory mapped ECC: low-cost error protection for last level caches
Full text PdfPdf (1.46 MB)
Source
International Symposium on Computer Architecture archive
Proceedings of the 36th annual international symposium on Computer architecture table of contents
Austin, TX, USA
SESSION: Reliability and fault tolerance table of contents
Pages 116-127  
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
Authors
Doe Hyun Yoon  University of Texas at Austin, Austin, TX, USA
Mattan Erez  University of Texas at Austin, Austin, TX, USA
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 58,   Downloads (12 Months): 183,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1555754.1555771
What is a DOI?

ABSTRACT

This paper presents a novel technique, Memory Mapped ECC, which reduces the cost of providing error correction for SRAM caches. It is important to limit such overheads as processor resources become constrained and error propensity increases. The continuing decrease in SRAM cell size and the growing capacity of caches increases the likelihood of errors in SRAM arrays. To address this, redundant information can be used to correct a value after an error occurs. Information redundancy is typically provided through error-correcting codes (ECC), which append bits to every SRAM row and increase the array's area and energy consumption. We make three observations regarding error protection and utilize them in our architecture: (1) much of the data in a cache is replicated throughout the hierarchy and is inherently redundant; (2) error-detection is necessary for every cache access and is cheaper than error correction, which is very infrequent; (3) redundant information for correction need not be stored in high-cost SRAM. Our unique architecture only dedicates SRAM for error detection while the ECC bits are stored within the memory hierarchy as data. We associate a physical memory address with each cache line for ECC storage and rely on locality to minimize the impact. The cache is dynamically and transparently partitioned between data and ECC with the fraction of ECC growing with the number of dirty cache lines. We show that this has little impact on both performance (1.3% average and < 4%) and memory traffic (3%) across a range of memory-intensive applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
H. Ando, K. Seki, S. Sakashita, M. Aihara, R. Kan, K. Imada, M. Itoh, M. Nagai, Y. Tosaka, K. Takahisa, and K. Hatanaka. Accelerated Testing of a 90nm SPARC64 V Microprocessor for Neutron SER. In Proceedings of IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE), April 2007.
 
2
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Technical Report TR-811-08, Princeton University, January 2008.
 
3
B. H. Calhoun and A. P. Chandrakasan. A 256kb Sub-threshold SRAM in 65nm CMOS. In Proceedings of the International Solid-State Circuits Conference (ISSCC), February 2006.
 
4
L. Chang, D. M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R. K. Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W. Guarini, and W. Haensch. Stable SRAM Cell Design for the 32nm Node and Beyond. In Digest of Technical Papers of Symposium on VLSI Technology, June 2005.
 
5
C. L. Chen and M. Y. Hsiao. Error-correcting Ccodes for Semiconductor Memory Applications: A State-of-the-art Review. IBM Journal of Research and Development, 28(2):124--134, March 1984.
 
6
 
7
Digital Equipment Corp. Alpha 21264 Microprocessor Hardware Reference Manual, July 1999.
 
8
G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and More Flexible Program Analysis. In Proceedings of Workshop on Modeling, Benchmarking and Simulation, June 2005.
 
9
R. W. Hamming. Error Correcting and Error Detecting Codes. Bell System Technical Journal, 29:147--160, April 1950.
 
10
M. Y. Hsiao. A Class of Optimal Minimum Odd-weight-column SEC-DED codes. IBM Journal of Reserach and Development, 14:395--301, 1970.
 
11
J. Huynh. White Paper: The AMD Athlon MP Processor with 512KB L2 Cache, May 2003.
 
12
 
13
 
14
15
 
16
J. P. Kulkarni, K. Kim, and K. roy. A 160mV Robust Schmitt Trigger Based Subthreshold SRAM. IEEE Journal of Solid-State Circuits, 42(10):2303--2313, October 2007.
17
18
 
19
S. Lin and D. J. C. Jr. Error Control Coding: Fundamentals and Applications. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.
20
 
21
 
22
J. Maiz, S. Hareland, K. Zhang, and P. Armstrong. Characterization of Multi-Bit Soft Error Events in Advanced SRAMs. In Technical Digest of IEEE International Electron Devices Meeting (IEDM), December 2003.
23
 
24
K. Osada, K. Yamaguchi, and Y. Saitoh. SRAM Immunity to Cosmic-Ray-Induced Multierrors based on Analysis of an Induced Parasitic Bipolar Effect. IEEE Journal of Solid-State Circuits, 39:827--833, May 2004.
25
 
26
I. S. Reed and G. Solomon. Polynomial Codes Over Certain Finite Fields. Journal of Society for Industrial and Applied Mathematics, 8:300--304, June 1960.
 
27
N. N. Sadler and D. J. Sorin. Choosing an Error Protection Scheme for a Microprocessor's L1 Data Cache. In Proceedings of International Conference on Computer Design (ICCD), October 2006.
 
28
N. Seifert, V. Zia, and B. Gill. Assessing the Impact of Scaling on the Efficacy of Spatial Redundancy based Mitigation Schemes for Terrestrial Applications. In Proceedings of IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE), April 2007.
 
29
C. Slayman. Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations. IEEE Transactions on Device and Materials Reliability, 5:397--404, September 2005.
 
30
Standard Performance Evaluation Corporation. SPEC CPU 2006. http://www.spec.org/cpu2006/, 2006.
 
31
J. Standards. JESD89 Measurement and Reporting of Alpha Particles and Terrestrial Cosmic Ray-Induced Soft Errors in Semiconductor Devices, JESD89-1 System Soft Error Rate (SSER) Method and JESD89-2 Test Method for Alpha Source Accelerated Soft Error Rate, 2001.
 
32
Sun Microsystems Inc. OpenSPARC T2 System-On-Chip (SOC) Microarchitecture Specification, May 2008.
 
33
J. M. Tendler, J. S. Dodson, J. S. F. Jr., H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Journal of Research and Development, 46(1):5--25, January 2002.
 
34
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical report, HP Laboratories, April 2008.
35
36
37
 
38
J. Wuu, D. Weiss, C. Morganti, and M. Dreesen. The asynchronous 24MB on-chip level-3 cache for a dual-core Itanium-family processor. In Proceedings of the International Solid-State Circuits Conference (ISSCC), February 2005.
 
39
 
40
W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam. ICR: In-Cache Replication for Enhancing Data Cache Reliability. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2003.

Collaborative Colleagues:
Doe Hyun Yoon: colleagues
Mattan Erez: colleagues