Virtualized and Flexible ECC for Main Memory

Slides:



Advertisements
Similar presentations
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 12 Reduce Miss Penalty and Hit Time
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
Paging 1 CS502 Spring 2006 Paging CS-502 Operating Systems.
16.317: Microprocessor System Design I
CS 153 Design of Operating Systems Spring 2015
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
Memory Organization.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
Virtual Memory:Part 2 Kashyap Sheth Kishore Putta Bijal Shah Kshama Desai.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
LOT-ECC: LOcalized and Tiered Reliability Mechanisms for Commodity Memory Systems Ani Udipi § Naveen Muralimanohar* Rajeev Balasubramonian Al Davis Norm.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
A brief introduction to Memory system proportionality and resilience Mattan Erez The University of Texas at Austin.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Chapter 4 Memory Management Virtual Memory.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Virtual Memory: Concepts Slides adapted from Bryant.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
1 Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Translation Lookaside Buffer
CS 704 Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
ECE232: Hardware Organization and Design
From Address Translation to Demand Paging
CS703 - Advanced Operating Systems
Section 9: Virtual Memory (VM)
CS 704 Advanced Computer Architecture
Cache Memory Presentation I
Lecture 15: DRAM Main Memory Systems
Address Translation for Manycore Systems
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Lecture 23: Cache, Memory, Virtual Memory
Lecture 17: Case Studies Topics: case studies for virtual memory and cache hierarchies (Sections )
Lecture 22: Cache Hierarchies, Memory
Lecture: DRAM Main Memory
Translation Lookaside Buffer
CSE 451: Operating Systems Autumn 2005 Memory Management
Virtual Memory Overcoming main memory size limitation
SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Contents Memory types & memory hierarchy Virtual memory (VM)
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Paging and Segmentation
Presentation transcript:

Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin Introduce the paper ASPLOS 2010

Memory Error Protection Applying ECC uniformly – ECC DIMMs Simple and transparent to programmers Error protection level Fixed, design-time decision Chipkill-correct used in high-end servers Constrain memory module design space Allow only x4 DRAMs Lower energy efficiency than x8 DRAMs Virtualized ECC – objectives To provide flexible memory error protection To relax design constraints of chipkill

Virtualized ECC Two-tiered error protection Tier-1 Error Code (T1EC) Simple error code for detection or light-weight correction Tier-2 Error Code (T2EC) Strong error correcting code Store T2EC within the memory namespace itself OS manages T2EC Flexible memory error protection Different T2EC for different data pages Stronger protection for more important data

Virtualized ECC – Example Error Protection Level Physical Memory Virtual Address space Page frame – i Virtual page – i Low Virtual Page to Physical Frame mapping Page frame – j Virtual page – j Page frame – k Virtual page – k T2EC for Chipkill High ECC page – j ECC page – k Physical Frame to ECC Page mapping T2EC for Double Chipkill Data T1EC

Virtualized ECC

Observations on Memory Errors Per-system error rate is still low Most of time, we try to detect errors finding no error To detect errors is a common case operation Need a low latency, low complexity error detection mechanism  T1EC To correct errors is an uncommon case operation Correction can be complex, take a long time But, still need to manage error correction info somewhere  Virtualized T2EC

Uniform ECC VA PA PA VPN PFN Physical Memory offset Virtual Memory Page Frame PA Virtual Memory PA PFN offset Data ECC

Virtualized ECC VA PA PA EA ECC Address VPN PFN T2EC ECC page number Physical Memory VA VPN offset Page Frame PA Virtual Memory PA PFN offset OS manages PFN to EPN translation ECC page number Scale according to T2EC size offset T2EC EA ECC Page ECC Address Data T1EC

Don’t need T2EC in most cases Read: fetch data and T1EC Virtualized ECC operation ECC Address Translation Unit: fast PA to EA translation Write: update data, T1EC, and T2EC T2EC lines can be partially valid Update only valid T2EC to DRAM T2ECs of consecutive data lines map to a T2EC line PA: 0x0200 3 Wr: 0x0200 2 B0 Rd: 0x00c0 1 A ECC Address Translation Unit EA: 0x0540 4 0 LLC 1 2 3 0 Wr: 0x0540 5 DRAM Rank 0 Rank 1 0000 0040 0080 00c0 A 0100 0140 0180 01c0 0200 0240 B1 B2 B3 0280 02c0 0300 0340 0380 03c0 0400 0440 0480 04c0 T2EC for Rank 1 data T2EC for Rank 0 data 0500 0540 1 2 3 0580 05c0 Data T1EC Data T1EC

Penalty with V-ECC Increased data miss rate T2EC lines in LLC reduce effective LLC size Increased traffic due to T2EC write-back One-way write-back traffic Not in a critical-path

Chipkill-Correct

Chipkill-correct Single Device-error Correct Double Device-error Detect Can tolerate a DRAM failure Can detect a second DRAM failure Chipkill requires x4 DRAMs x8 chipkill is impractical But, x8 DRAM is more energy efficient

Baseline x4 Chipkill Two x4 ECC DIMMs Access granularity 128bit data + 16bit ECC (redundancy overhead: 12.5%) 4 check symbol error code using 4-bit symbol Access granularity 64B in DDR2 (min. burst 4 x 128 bit) 128B in DDR3 (min. burst 8 x 128 bit) x4 144-bit wide data bus

x8 Chipkill x8 chipkill with the same access granularity 152-bit wide data path 128-bit data + 24-bit ECC Redundancy overhead: 18.75% Need a custom-designed DIMM Increase the system cost a lot 152-bit wide data bus x8

x8 Chipkill /w Standard DIMMs Increase access granularity 128B in DDR2 (min. burst 4 x 256 bit) 256B in DDR3 (min. burst 8 x 256 bit) x8 280-bit wide data bus

V-ECC for Chipkill Use 3 check symbol error codes T1EC T2EC Single Symbol-error Correct and Double Symbol-error Detect T1EC 2 check symbols Detect up to 2 symbol error T2EC 3rd check symbol Combined T1EC/T2EC provides Chipkill

V-ECC: ECC x4 configuration Use 8-bit symbol error code 2 bursts out of a x4 DRAM form an 8bit-symbol Modern DRAMs have minimum burst of 4 or 8 1 x4 ECC DIMM + 1 x4 Non-ECC DIMM Each DRAM access in DDR2 (burst 4) 64B data, 4B T1EC 2B T2EC is virtualized within memory namespace 32 T2ECs per 64B cache line Virtualized within memory T2EC x4 136-bit wide data bus Data T1EC

V-ECC: ECC x8 configuration Use 8-bit symbol error code 2 x8 ECC DIMMs Each DRAM access in DDR2 (burst 4) 64B data, 8B T1EC 4B T2EC is virtualized 16 T2ECs per 64B cache line Virtualized within memory T2EC 144-bit wide data bus x8 Data T1EC

Flexible Error Protection Single HW with V-ECC can provide Chipkill-detect, Chipkill-correct, and Double chipkill-correct Use different T2EC for different pages Reliability – Performance tradeoff Maximize performance/power efficiency with Chipkill-Detect Stronger protection at the cost of additional T2EC access Chipkill-Detect Chipkill-Correct Double Chipkill-Correct ECC x4 0B 2B 4B ECC x8 8B

Evaluation

Simulator/Workload GEMS + DRAMsim Power model Workloads An out-of-order SPARC V9 core Exclusive two-level cache hierarchy DDR2 800MHz – 12.8GB/s (128-bit wide data path) 1 channel 4 ranks Power model WATTCH for processor power – scaled to 45nm CACTI for cache power – cacti 45nm Micron model for DRAM power – commodity DRAMs Workloads 12 data intensive applications from SPEC CPU 2006 and PARSEC Microbenchmarks: STREAM and GUPS Say WHY these apps are chosen – mem intensive, worst behavior with V-ECC ALSO, explain STREAM and GUPS briefly

Normalized Execution Time Less than 1% penalty on average Performance penalty  Spatial locality  Write-back traffic  Low spatial locality + high write-back traffic: omnetpp, canneal, GUPS Low spatial locality, but low write-back traffic: mcf High write-back traffic, but high spatial locality: lbm

System Energy Efficiency Energy Delay Product (EDP) gain ECC x4: 1.1% on average ECC x8: 12.0% on average 1.23 10% 20% 17% 12% Emphasize – Same or stronger error protection level

Flexible Error Protection Chipkill-Detect Chipkill-Correct Double Chipkill-Correct Single HW can provide chipkill-detect, chipkill, double chipkill, dynamically.

Conclusion Virtualized ECC Two-tiered error protection, virtualized T2EC Improved system energy efficiency with chipkill Reduce DRAM power consumption by 27% Improve system EDP by 12% Performance penalty – 1% on average Error protection even for Non-ECC DIMMs Can be used for GPU memory error protection Flexibility in error protection Adaptive error protection level by user/system demand Cost of error protection is proportional to protection level Say there’re more details in the paper: Error protection for Non-ECC DIMMs, PA to EA translation, ECC address translation unit, T2EC management, …

Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin

Backup

Virtualized ECC Operations DRAM read Fetch data and T1EC – detect errors Don’t need T2EC in most cases DRAM write-back Update data, T1EC, and T2EC Cache T2EC for locality on T2EC access Need to translate PA to EA On-chip ECC address translation unit TLB-like structure for fast PA to EA translation Error correction Need to read T2EC; maybe in the LLC or DRAM

ECC Address Translation Unit LLC PA: 0x0200 3 Wr: 0x0200 2 B0 Rd: 0x00c0 1 A ECC Address Translation Unit EA: 0x0540 4 0 1 2 3 0 Wr: 0x0540 5 DRAM Rank 0 Rank 1 0000 0040 0080 00c0 A 0100 0140 0180 01c0 0200 0240 B1 B2 B3 0280 02c0 0300 0340 0380 03c0 0400 0440 0480 04c0 T2EC for Rank 1 data T2EC for Rank 0 data 0500 0540 1 2 3 0580 05c0 Data T1EC Data T1EC

RECAP: V-ECC Two-tiered error protection V-ECC for chipkill Uniform T1EC Virtualized T2EC V-ECC for chipkill ECC x4 configuration: saves 8 data pins ECC x8 configuration: more energy efficient Flexible error protection Different T2EC for different pages Stronger protection for important data No protection for not important data

Power Consumption DRAM power saving Total power saving ECC x4: 4.2%

Caching T2EC T2EC occupancy: Less than 10% on average MPKI overhead: Very small The higher spatial locality, the less impact on caching behavior T2EC occupancy: less than 10% on average -- over 10% only in omnetpp, milc, canneal, and fluidanimate MPKI overhead: very small x8 affects more: 32 T2ECs per cache line in x4, but 16 T2ECs per cache line in x8

Traffic Traffic increase – less than 10% on average Increased demand misses; T2EC traffic Spatial locality is important, so is the amount of write-back traffic Traffic: -- increased demand misses due to T2EC occupancy -- T2EC traffic mcf: doesn’t have spatial locality, but doesn’t have much write-back traffic.

Virtualized ECC Uniform T1EC Virtualized T2EC Low-cost error detection or light-weight correction Virtualized T2EC Correct errors detected uncorrectable by T1EC Cacheable and memory mapped Read accesses data and T1EC Don’t need T2EC in most times Simpler common case read operations Write updates data, T1EC, and T2EC

Flexible Error Protection ECC x8 DRAM configuration Stronger error protection at the cost of more T2EC accesses Additional cost of double chip-kill (relative to chip-kill) is quite small Adaptation is with per-page granularity

What if BW is limited? Half DRAM BW – 6.4GB/s Emulate CMP where BW is more scarce

Virtualized ECC for Non-ECC DIMMs

ECC for non-ECC DIMMs Virtualize ECC in memory namespace Not a two-tiered error protection No uniform ECC storage (for T1EC) But, let’s say the ECC as ‘T2EC’ to keep notation consistent Virtualized T2EC both detects and corrects errors Now, a DRAM read also triggers a T2EC access Increased T2EC traffic, increased T2EC occupancy, and more penalty But, we can detect and correct errors with non-ECC DIMMs

ECC Address Translation Unit LLC 2 PA: 0x0180 6 PA: 0x00c0 A C ECC Address Translation Unit 7 EA: 0x0510 D B 3 EA: 0x0550 1 Rd: 0x0180 8 Rd: 0x0510 5 Wr: 0x0140 4 Rd: 0x0540 DRAM Rank 0 Rank 1 0000 0040 0080 00c0 C 0100 0140 A 0180 01c0 0200 0240 0280 02c0 0300 0340 0380 03c0 0400 0440 0480 04c0 T2EC for Rank 1 data T2EC for Rank 0 data D 0500 0540 B 0580 05c0 Data Data

DIMM configurations Use 2 check symbol error codes DIMM configurations Can detect and correct up to 1 symbol error No 2 symbol error detection Weaker protection than Chip-Kill, but it’s better than nothing DIMM configurations Can even use x16 DRAMs (way more energy efficient than x4 DRAMs) DRAM type # Data DRAMs per rank T2EC per 64B cache line Non-ECC x4 x4 32 4B Non-ECC x8 x8 16 8B Non-ECC x16 x16 8 16B

Performance and Energy Efficiency More performance degradation (compared to ECC DIMMs) Every read accesses T2EC More T2EC traffic more T2EC occupancy in LLC Energy efficiency is sometimes better x16 DRAMs save a lot of DRAM power Performance degradation is low if spatial locality is good

Flexible error protection A page can have different T2EC sizes Error protection level of a page can be No protection 1 chip-kill detect 1 chip-kill correct (but can’t detect 2 chip-kill) 2 chip-kill correct Penalty is proportional to protection level T2EC size per 64B cache line No protection 1 Chip-Kill detect 1 Chip-Kill Correct* 2 chip-kill correct Non-ECC x4 0B 2B 4B 8B Non-ECC x8 16B Non-ECC x16 32B * It cannot detect 2 chip-kill

Non-ECC x8 Non-ECC x16

Managing T2EC

OS manages T2EC PA to EA translation structure T2EC storage Only dirty pages require T2EC (with ECC DIMMs) Can use Copy-On-Write T2EC allocation Every data page needs T2EC in non-ECC implementation Free T2EC when a data page is freed/evicted

PA to EA Translation Every write-back (with ECC DIMMs) or read/write (with non-ECC DIMMs) needs to access T2EC Translation is similar to VA to PA translaation OS manages a single translation structure

Example Translation Physical address (PA) Level 1 Level 2 Level 3 Page offset ECC page table Base register + >> log2(T2EC) ECC table entry + ECC table entry + ECC table entry ECC page number ECC Page offset ECC address (EA)

Accelerating Translation ECC address translation unit Cache PA to EA translation Like TLBs Hierarchical caching – 2 levels 1st level manages consistency with TLB 2nd level as a victim cache Read triggered translation 100% hit; L1 EA cache is consistent with TLB Only occurs with non-ECC DIMMs Write triggered translation Probably hit; L2 EA cache can be relatively large

ECC Address Translation Unit TLB ECC address translation unit To manage consistency between TLB and L1 EA cache PA L1 EA cache EA L2 EA cache Control logic EA MSHR 2-level EA cache External EA translation

Possible Impacts TLB miss penalty EA cache misses per 1000 instrs VA to PA translation, then PA to EA translation Seems like negligible – already assumed doubled TLB miss penalty in the evaluation Design alternative: to translate VA to EA directly Need to manage per-process translation structure But potentially less impact on TLB miss penalty EA cache misses per 1000 instrs Configuration 16 entry FA L1 EA cache 4k entry 8 way L2 EA cache ~3 in omnetpp and canneal ~12 in GUPS Less than 1 in other apps Things might get messed up with a software TLB handler

Chip-Kill-Correct Single device error correct, Double device error detect Other names: DRAM RAID, Extended ECC, Advanced ECC, … Can tolerate a DRAM device failure Using x1 DRAMs SEC-DED effectively does chip-kill-correct But, there’s no x1 DRAM any more (really?) x1 … 64 data bits 8 ECC bits

Interleaved SEC-DED 4 interleaved SEC-DED – x4 Chip-Kill 256bit data width Works with old DRAMs Modern DRAMs use burst access Granularity – DDR2: 128B, DDR3: 256B x4 64 data DRAMs 8 ECC DRAMs (72,64) SEC-DED …

x4 Non ECC-DIMM x4 ECC-DIMM data Virtualized T1EC T2EC x8 ECC-DIMM Burst 4 T1EC T2EC x8 ECC-DIMM data Virtualized Burst 4 T1EC T2EC

Why is x8 chipkill impractical? With the same access granularity Higher redundancy overhead 128-bit data + 24-bit ECC (18.75%) Need custom-designed DIMMs Using standard ECC DIMMs Wider data-path 256-bit data + 24-bit ECC (9.375%) Increase access granularity 128B in DDR2 256B in DDR3 Using x8 DRAM is preferable, since x8 DRAM consumes 30% less power than x4 DRAMs if the total capacity is same. But, chipkill-correct using x8 DRAMs is impractical. It either requires custom-designed DIMMs if we want to maintain the access granularity, or Increases access granularity if we want to use commodity DIMMs

DRAM Modules Non-ECC DIMMs ECC DIMMs SEC-DED 64-bit wide data path Additional DRAMs dedicated to storing ECC Additional pins to transfer ECC SEC-DED Single-bit Error Correction Double-bit Error Detection 64bit data + 8bit ECC ECC DIMMs provide only additional storage and data pins for storing and transferring redundant information, And the actual ECC encoding / decoding takes place at the memory controllers, so that the system designers can choose an error protection mechanism. Typical memory error protection using 72-bit wide ECC DIMMs is based on the SEC-DEC, single-bit error correction and double-bit error detection code. With SEC-DED, each 64bit data is protected by 8bit SEC-DED code.

64-bit x4 Non-ECC DIMM 64-bit x8 Non-ECC DIMM 72-bit x4 ECC DIMM This shows the structures of standard memory modules, DIMMs without ECC support. Depending on the types of DRAMs used, there are x4, x8, and x16 DIMMs. Standard DIMMs have 64bit-wide data path, so there’re 16 x4 DRAMs per rank in x4 DIMM, 8 x8 DRAMs per rank in x8 DIMM, and 4 x16 DRAMs per rank in x16 DIMM. 72-bit x8 x8 ECC DIMM

High-end Servers Need BOTH reliability and energy efficiency Chipkill-correct But, chipkill requires x4 configurations Using more energy efficient x8 configurations is impractical with chipkill

High-level Memory Models VA space PA space VA space PA space T2EC VA PA PA EA VA Program Program This compares the high-level memory models of the conventional architecture and the virtualized ECC architecture. VM translates program’s VA into PA that points to data and ECC in the conventional architecture. In V-ECC, PA only points to data and T1EC. As mentioned, read operations don’t need to access T2EC. But when an error is detected or when a write is performed, T2EC should be also accessed. In order to access T2EC, we need ECC address, EA in short. PA to EA translation is done similar to VA to PA translation, and OS can manage this translation. Data ECC Data T1EC Conventional Architecture Virtualized ECC Architecture

Example Application 1’s VA space Application 2’s VA space VA to PA mapping DRAM Data T1EC PA to EA mapping

Standard DIMMs x4 Non-ECC DIMMs x4 ECC DIMMs 16 x4 DRAMs per rank 64bit-wide data bus x4 x4 Non-ECC DIMM This shows the structures of standard memory modules, DIMMs without ECC support. Depending on the types of DRAMs used, there are x4, x8, and x16 DIMMs. Standard DIMMs have 64bit-wide data path, so there’re 16 x4 DRAMs per rank in x4 DIMM, 8 x8 DRAMs per rank in x8 DIMM, and 4 x16 DRAMs per rank in x16 DIMM. 72bit-wide data bus x4 x4 ECC DIMM

Standard DIMMs – Cont’d 8 x8 DRAMs per rank in Non-ECC DIMMs 9 x8 DRAMs per rank in ECC DIMMs x8 consumes 30% less power than x4 64bit-wide data bus x8 x8 Non-ECC DIMM This shows the structures of standard memory modules, DIMMs without ECC support. Depending on the types of DRAMs used, there are x4, x8, and x16 DIMMs. Standard DIMMs have 64bit-wide data path, so there’re 16 x4 DRAMs per rank in x4 DIMM, 8 x8 DRAMs per rank in x8 DIMM, and 4 x16 DRAMs per rank in x16 DIMM. 72bit-wide data bus x8 x8 ECC DIMM

Standard DIMMs – Cont’d 4 x16 DRAMs per rank in Non-ECC DIMMs No x16 ECC DIMMs More power efficient than x8 DRAMs 64bit-wide data bus x16 x16 Non-ECC DIMM ECC DIMMs, on the other hand, have 72bit-wide data path, of which 8bit is for ECC. X4 ECC DIMMs have 18 x4 DRAMs, and x8 ECC DIMMs have 9 x8 DRAMs, but there’s no x16 ECC DIMM. NO x16 ECC DIMM

Configurations Baseline x4 Virtualized ECC Traditional uniform Chip-Kill Note: x8 Chip-Kill is not practical Virtualized ECC ECC x4 Save 8 data pins ECC x8 Use more energy efficient x8 DRAM Baseline x4 ECC x4 ECC x8 128bit data 16bit ECC 128bit data 8bit ECC 128bit data 16bit ECC x4 ECC DIMM x4 ECC DIMM x4 ECC DIMM x4 Non ECC DIMM x8 ECC DIMM x8 ECC DIMM

Symbol based error code b-bit symbol GF(2^b) based arithmetic Simple rules 1 check symbol 1 symbol error detect 2 check symbols 1 symbol error correct 2 symbol error detect 3 check symbols 1 symbol error correct + 2 symbol error detect 3 symbol error detect 4 check symbols 2 symbol error correct + 2 symbol error detect 4 symbol error detect 3 check symbol error code provides Chip-Kill-Correct Max codeword length: 2^b+2 symbols b=4: 60bit data + 12bit ECC b=8: 2008bit data + 24bit ECC