An Integrated ECC and Redundancy Repair Scheme for Memory Reliability Enhancement National Tsing Hua University Hsinchu, Taiwan Chin-Lung Su, Yi-Ting Yeh, and Cheng-Wen Wu
2 IC-DFN/08-06/cww Introduction Memory cores are widely used in SOC designs They have higher density and occupy larger area Dominate the chip yield Their use is increasing in nano-technologies according to ITRS Reliability is also an important issue for memory ECC and redundancy repair are both widely used fault tolerance techniques After production test, there may be some un-used redundancy Combine ECC and un-used redundancy Higher yield and greater degree of fault tolerance
3 IC-DFN/08-06/cww Chip Area Breakdown Source: International Technology Roadmap for Semiconductors (ITRS),
4 IC-DFN/08-06/cww Typical RAM BIST Architecture RAM Test Collar (MUX) BIST Module Controller Comparator Pattern Generator Go/No-Go RAM Controller Counter LUT LFSR Microprogram Hardwired CPU core IEEE
5 IC-DFN/08-06/cww Sharing Controller & Sequencer
6 IC-DFN/08-06/cww Typical RAM ECC Architecture RAM Cb Gen Decoder Syndrome Gen Corrector SingleDouble Syndrome Data Bus Mainly for improving reliability
7 IC-DFN/08-06/cww RAM Built-In Self-Repair (BISR) RAM MUX BIST Redundancy Analyzer Reconfiguration Mechanism Spare Elements I/O Mainly for improving yield
8 IC-DFN/08-06/cww Main Memory Spare Memory BIRA BIST Wrapper Q D A A Power-On BISR Scheme MAO POR MAO: mask address output; POR: power-on reset Source: ITC’03
9 IC-DFN/08-06/cww Proposed Scheme Integrated ECC and Redundancy Repair Scheme Hard errors are repaired by physical redundancy in field Soft-error correction ability is not harmed by hard errors Enhance reliability Assumptions During Error Identification phase, no other faults may occur Error rate << system clock speed
10 IC-DFN/08-06/cww Phases of Proposed Scheme
11 IC-DFN/08-06/cww Error Identification Phase Write back process Write the corrected data back to memory Read data from the same address Soft error may be eliminated with this process Assume that no other errors may occur After error identification “Hard repair phase” for a hard error/fault “Fault-free phase” for a soft error
12 IC-DFN/08-06/cww Hard Repair Phase Repair this hard fault with spare Map the faulty word to redundant word Write the corrected data into redundant word Hard fault location In main memory: follow the above procedure In redundant memory: mark the faulty redundant element During this phase, memory cannot be accessed Idle mode Hard fault is removed after this phase Reliability and MTTF is increased
13 IC-DFN/08-06/cww Experimental Results Technology: TSMC 0.25um CMOS process The redundant memory consists of eight spare rows and four spare columns Memory SizeECCBISTRCTotal (%) 32K x 321, K x 641, K x 1282, K x 641, K x 1281,
14 IC-DFN/08-06/cww Experimental Results (cont.) Memory SizeArea (gates)MTTF (hours)Cost 8k X 32!ECC SEC SECP k X 64!ECC SEC SECP k X 128!ECC SEC SECP Cost = Area / MTTF !ECC: Without ECC SEC: With SEC/DED ECC SECP: Proposed Scheme Area = Memory + ECC + BIST
15 IC-DFN/08-06/cww Reliability Improvement 8K x 64 memory r+c = 12
16 IC-DFN/08-06/cww Conclusions An integrated ECC and redundancy repair scheme is proposed Enhancing memory reliability and MTTF Low area overhead Integrating ECC Controller with BIST No timing penalty in normal operation Cost-effective way for reducing the effect of parametric defects