P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research Center New York Research done while at: IBM T. J. Watson Research Center New York MICRO 2011 Dec 6, 2011
Introduction PCM is a scalable technology. Device state changed by heating. PAY-AS-YOU-GO, MICRO-2011 Over time, write operations break heater Cell gets stuck Reported write endurance: million writes/cell With good wear leveling still possible to have 8+ years lifetime
Not All Cells Are Created Equal PAY-AS-YOU-GO, MICRO-2011 Variability in lifetime due to process variation: weak vs. strong cells Weak cells fail much earlier reduce system lifetime greatly Lifetime usually modeled as Gaussian with SDEV of 10-30% of mean We use SDEV=20% of mean P (5 SDEV from mean) ≈ For 1GB memory bank, 8K bits fail at time 0, more as we write! PCM needs significant amount of error correction to handle variability
Write Efficient Code Traditional ECC codes are write intensive More wear Endurance related (hard) faults identified with checker read Write-efficient code: Error Correcting Pointers [ISCA’10] PAY-AS-YOU-GO, MICRO-2011 ECP needs 10 bits per entry. Handles multiple faults (needs 1 Full bit) … 511 Cache Line (512b) X Pointer 9 bit D For correcting N errors, ECP needs (10N+1) bits 1 bit
Expensive to Correct Many Errors To get 6+ years lifetime, we need to correct six errors per line Storage: 61 bits/line (about 12%, 1GB for 8GB) Expensive Unlike ECC in current DRAM chips, this overhead is not optional PAY-AS-YOU-GO, MICRO Baseline System Lifetime (years) NoECP ECP-1 ECP-2 ECP-3 ECP-4 ECP-5 ECP-6 Goal: Reduce storage significantly (3X-6X) while retaining lifetime
Motivation Uniformly allocating error correction entries is inefficient (by ~20X) We do not need to pay for error correction of each line upfront PAY-AS-YOU-GO, MICRO-2011 Pay-As-You-Go: Give error correction entries in proportion to errors Num Writes (Normalized) No ECP used Only ECP-1 used ECP-2 to ECP-6 used Average ECP Used 50%99.02%0.97%0.01% %79.63%18.14%2.23% %73.24%22.82%3.95%0.31 Utilization of error correction entries per line Key insight: Very few lines have large number of errors
Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011
Naïve Design for PAYG PAY-AS-YOU-GO, MICRO-2011 MEMORY LINE (64B) OFB Ways (Num GEC entries per set) Sets V TAG ECP-N GEC Entry Global Error Correction (GEC) Pool Given 73% of lines have no error, why not give ECP-6 only on error? GEC Pool structure: Set associative vs. Fully associative (impractical)
Three Key Problems 1.Set associative structure is inefficient (by ~8X for 8-way) 2.If we allocate six ECP entries per each GEC entry, most error correction entries still remain unused 3.Given >25% of lines are likely to have at-least on error, the latency impact of GEC is significant PAY-AS-YOU-GO, MICRO-2011
Inefficiency of Set Associative GEC PAY-AS-YOU-GO, MICRO-2011 There are 10s/100s of thousand of sets Any set could overflow How many entries used before one set overflows? Buckets-and-Balls An 8-way GEC only 12% full when one set overflows Need 8x entries
Scalable Structure for GEC Pool PAY-AS-YOU-GO, MICRO-2011 “Hash-Table With Chaining” structure for flexibility & low latency OFB Set Associative Table (SAT) Global Collision Table (GCT) GEC Entry 1 PTR 1 GCT-HEAD *PTR is two-way replicated TAKEN BY SOME OTHER SET
Scalable Structure for GEC Pool PAY-AS-YOU-GO, MICRO-2011 Structure Total EntriesLatency Fully Associative NVery High 8-way Set Associative 8*N 1 8-way (SAT+GCT) 1.5*N 1+epsilon Proposed GEC structure has latency similar to Set Associative Table while needing 5X fewer entries Global Collision Table (GCT) with half as many sets as SAT is sufficient Lets say we want to store N entries
Solving Other Two Problems 2. Fine Grained Allocation for effectively utilizing ECP entries Each GEC entry has only ECP-1. Each line can have multiple GEC entries We guarantee that all entries are in same set of (SAT/GCT) A faulty line can get more than ECP-6 as well 3. Local Error Correction (LEC) for low latency in common case Each line has dedicated ECP-1 (handles 95% lines) Ensures extra accesses (GEC) needed for only few lines PAY-AS-YOU-GO, MICRO-2011
PAYG: Tying it All Together PAY-AS-YOU-GO, MICRO-2011 PAYG performs on-demand allocation of error correction entries PAYG has 3 levels. LEC is first line of defense (lowers latency) SAT is second and GCT is third (flexible)
Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011
Evaluation Settings PAY-AS-YOU-GO, MICRO-2011 Assumptions: 1. Mean writes 32 Million, SDEV=20%, no correlation 2. Perfect wear leveling all lines get same number of writes 3. Writes are converted into writes-read to detect faults Configuration: PCM bank of 1GB with 64B lines, so 16 million lines per bank Write latency of 1 micro second At 100% write traffic, lifetime is 18 years (if zero variance) Figure of Merit: Uniform ECP-6 gets 35% of ideal lifetime, so 6.5 years We report lifetime with respect to Uniform ECP-6
Importance of Scalable GEC Pool PAY-AS-YOU-GO, MICRO-2011 Proposed structure reduces storage overhead of GEC by more than 5X Num SAT Sets Num GCT Sets (SAT Sets=128K) NoFGA-NoGCTNoFGA-wGCT Total Sets 128K+64K=192K
Importance of Fine-Grained Alloc. PAY-AS-YOU-GO, MICRO-2011 Num ECP Entries in Each GEC Entry54321 Num GEC Entry per Set (64B line) Total ECP Entries per Set Fine-Grained Allocation improves the effectiveness of PAYG
Importance of LEC PAY-AS-YOU-GO, MICRO-2011 We can get higher lifetime by increasing GEC size but we still need LEC 5 years For first 5 years, PAYG incurs on avg 1 extra access for < 0.4% accesses Without LEC, latency impact is significant. With LEC, not so much
Storage Overhead PAY-AS-YOU-GO, MICRO-2011 LEC Storage13 bits/line (10 bit ECP + 1 valid + 2 OFB) GEC Storage6.5 bits/line on average Total19.5 bits/line SchemeStorage Overhead (bits/line) Lifetime Uniform ECP-6611X Uniform ECP X PAYG with ECP-1 in LEC X PAYG provides lifetime similar to ECP-8 at 3.1X less storage than ECP-6 (Total storage overhead to protect 1GB reduces from 122MB to 39MB, down 83MB)
Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011
Efficient Single Bit Correction PAY-AS-YOU-GO, MICRO-2011 LEC responsible for most of storage overhead (13 bits out of 19.5 bits) Need efficient schemes single bit hard faults Alternate Data Retry (ADR) ADR: Mask hard fault by storing data in either normal or inverted form SA-0 0 INV SA-0 1 INV ADR needs only 1 bit to mask a single stuck-at-fault (caveat: double write) Reduce storage overhead of PAYG by using ADR instead of ECP-1 in LEC
Comparisons PAY-AS-YOU-GO, MICRO-2011 SchemeStorage Overhead (bits/line) Lifetime Uniform ECP-6611X Uniform ECP X PAYG with ECP-1 in LEC X PAYG with ADR in LEC X PAYG with heterogeneous error correction reduces storage by 6X Hard to scale ADR to multiple faults. SAFER [MICRO’10] partitions lines with multiple faults into single bit faults. SAFER needs 55 bits/line and lifetime ~ECP-6
Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011
Non Uniform Error Correction Variable Strength ECC (VS-ECC) by Alameldeen+ ISCA’11 Proposed for cache reliability at low voltages Each way has ECC-4 for one quarter of ways, allocated based on testing Difference: Cache line disabling works. Only set associative structure. Layered ECP by Schechter+ ISCA’10 ECP-1 for each line, and some ECP entries for each page In essence, this is a set-associative GEC with ECP-1 in LEC Difference: Set associative GEC requires 5X more entries (inefficient) Line Sparing with FREE-p by Hyun+ HPCA’11 A faulty line is remapped to a spare area using embedded pointer Sparing needs 1 good line for 1 uncorrectable fault Difference: PAYG is much more storage efficient than sparing PAY-AS-YOU-GO, MICRO-2011
FREE-p: Sparing vs. Correction PAY-AS-YOU-GO, MICRO-2011 For 1 extra error bit, PAYG needs 20 bit GEC entry, FREE-p needs 512 bit PAYG is more effective than line sparing with FREE-p
Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011
Summary PAY-AS-YOU-GO, MICRO-2011 PCM: limited endurance, variability across cells reduces lifetime Need to correct many (six) errors per line Uniform allocation is expensive and inefficient (only 0.3 out of 6 used) Pay-As-You-Go (PAYG): Allocate error correction entries on demand PAYG has LEC + GEC Pool (Set Associative Table + Global Collision Table) Provides 1.13X lifetime compared to ECP-6 at 3.1X lower overhead Heterogeneous scheme (ADR for LEC) reduces storage by 6X PAYG useful for efficient hard-error correction in other technologies too