Download presentation
Presentation is loading. Please wait.
Published byBrett Payne Modified over 8 years ago
1
ECE 411: Computer Organization & Design 1 ECE 411 DRAM & Storage Acknowledgement: Many slides were adapted from Prof. Hsien-Hsin Lee’s ECE4100/6100 Advanced Computer Architecture course at Georgia Inst. of Tech. with his generous permission.
2
ECE 411: Computer Organization & Design 2 The DRAM Cell Why DRAMs Higher density than SRAMs Disadvantages Longer access times Leaky, needs to be refreshed Cannot be easily integrated with CMOS Stack capacitor (vs. Trench capacitor) Source: Memory Arch Course, Insa. Toulouse Word Line (Control) Storage Capacitor Bit Line (Information) 1T1C DRAM cell
3
ECE 411: Computer Organization & Design 3 One DRAM Bank
4
ECE 411: Computer Organization & Design 4 Column decoder Row decoder Example: 512Mb 4-bank DRAM (x4) Sense amps I/O gating Row decoder Column decoder Data out D[3:0] Address A[13:0] A[10:0] Address Multiplexing 16K 2k A x4 DRAM chip A DRAM page = 2kx4 = 1KB BA[1:0] Bank0 16384 x 2048 x 4
5
ECE 411: Computer Organization & Design 5 DRAM Cell Array Wordline0Wordline1Wordline2Wordline1023 bitline0 bitline1 bitline2 bitline15 Wordline3
6
ECE 411: Computer Organization & Design 6 DRAM Sensing (Open Bitline Array) WL0WL1WL2WL127 A DRAM Subarry WL128WL129WL130WL255 A DRAM Subarry SenseAmp
7
ECE 411: Computer Organization & Design 7 Basic DRAM Operations SenseAmp Vdd/2 WL BL Vdd/2 Vdd Write ‘1’ driver Vdd - Vth SenseAmp Vdd/2 WL BL Precharge to Vdd/2 Vdd/2 + V signal Read ‘1’ Vdd - Vth Cm C BL Amplified V signal refresh
8
ECE 411: Computer Organization & Design 8 DRAM Basics Address multiplexing Send row address when RAS asserted Send column address when CAS asserted DRAM reads are self-destructive Rewrite after a read Memory array All bits within an array work in unison Memory bank Different banks can operate independently DRAM rank Chips inside the same rank are accessed simultaneously
9
ECE 411: Computer Organization & Design 9 Examples of DRAM DIMM Standards D0D7 x8 D8 D15 x8 D16D23 x8 D24D31 x8 D32D39 x8 D40 D47 x8 D48D55 x8 D56 D63 x8 x64 (No ECC) D0D7 x8 D8 D15 x8 CB0 CB7 x8 D16D23 x8 D24D31 x8 D32 D39 x8 D40 D47 x8 D48D55 x8 X72 (ECC) D56 D63 x8
10
ECE 411: Computer Organization & Design 10 DRAM Ranks x8 D0D7D8 D15D16D23D24D31D32D39D40 D47D48D55D56 D63 CS1 CS0 Memory Controller Rank0 Rank1
11
ECE 411: Computer Organization & Design 11 DRAM Ranks Single Rank 8b 64b Single Rank 4b 64b 4b Dual- Rank 8b 64b 8b
12
ECE 411: Computer Organization & Design 12 DRAM Organization Source: Memory Systems Architecture Course, B. Jacobs, Maryland
13
ECE 411: Computer Organization & Design 13 Organization of DRAM Modules Source: Memory Systems Architecture Course Bruce Jacobs, University of Maryland Memory Controller Addr and Cmd Bus Data Bus Channel Multi-Banked DRAM Chip
14
ECE 411: Computer Organization & Design 14 DRAM Configuration Example Source: MICRON DDR3 DRAM
15
ECE 411: Computer Organization & Design 15 Memory Controller DRAM Module Addr Bus WE CAS RAS Assert RAS Row Address Row Opened Data Bus Column Address Assert CAS DRAM Access (Non Nibble Mode) RAS CAS ADDR DATA Row Addr Col Addr Data Col Addr Data
16
ECE 411: Computer Organization & Design 16 DRAM Refresh Leaky storage Periodic Refresh across DRAM rows Un-accessible when refreshing Read, and write the same data back Example: 4k rows in a DRAM 100ns read cycle Decay in 64ms 4096*100ns = 410 s to refresh once 410 s / 64ms = 0.64% unavailability
17
ECE 411: Computer Organization & Design 17 DRAM Refresh Styles Bursty 64ms 410 s =(100ns*4096) 410 s 64ms Distributed 64ms 15.6 s 64ms 100ns
18
ECE 411: Computer Organization & Design 18 Types of DRAM and DIMMs Synchronous DRAM: Adds predictability to DRAM operation DDRx: Transfer data on both edges of the clock GDDRx: Graphics HBM: Graphic and HPC HMC: HPC DIMMs FB-DIMM: DIMMs connected using point to point connection instead of bus. Allows more DIMMs to be incorporated in server based systems R-DIMM: Buffers only address/command bus LR-DIMM: Buffer both address/command and data buses
19
ECE 411: Computer Organization & Design 19 Memory Module Types UDIMM RDIMM DDR3 LRDIMM DDR4 LRDIMM
20
ECE 411: Computer Organization & Design 20 Memory: Fundamental Trade-offs overcoming capacity & bandwidth limits of DDRx DIMMs capacity using LR-DIMMs or NVM-base DIMMs increased latency bandwidth using HBM or HMC limited capacity and/or increased latency latency capacity HBM/HMC DDRx bandwidth 3DXPoint
21
ECE 411: Computer Organization & Design 21 × 7 % area overhead Latency/Capacity Trade-off larger capacity (higher density) – longer latency example 1: DRAM larger mat size more cells sharing peripherals higher density × longer row cycle time (tRC) 37 % tRC reduction
22
ECE 411: Computer Organization & Design 22 Latency/Capacity Trade-off larger capacity (higher density) – longer latency example 1: DRAM example 2: Phase-Change Memory (PCM) a single physical cell can store one or multiple bits single-level cell (SLC) or multi-level cell (MLC) MLC PCM, higher density, longer latency
23
ECE 411: Computer Organization & Design 23 Bandwidth Scalability w/ Parallel Interface go wider ? more pins required on processor side 130+ pins/channel pin on package not scale more difficulty in electrical wire length matching go faster ? cross-talk & attenuation bit-wise synchronization to clock signal clock-to-data & data-to-data skews high power cost I/O power = Energy per bit transfer (pJ/bit) × Bandwidth (Gbps)
24
ECE 411: Computer Organization & Design 24 Memory I/O Interface Power energy efficiency (pJ/bit) improving driven by architecture, circuit, and process improvements I/O power kept at a low level but in future … super-linear degradation in energy efficiency over-extending channel speed requires power-hungry I/O circuitry 100 GB/s × 20 pJ/bit = 16 W ! Source: Intel
25
ECE 411: Computer Organization & Design 25 High-speed Serial Interface (HSI) high bit rate up 20 Gbps/pin vs 1.6 Gbps/pin (DDR3-1600) fewer pins needed for same bandwidth more channels allowed high energy efficiency (low pJ/bit) as low as sub 1 pJ/bit @ 15~20 Gbps low package power; low energy consumption better scalability, avoid super-linear degradation drawbacks longer latency higher static power/energy consumption
26
ECE 411: Computer Organization & Design 26 Memory Scheduling FR-FCFS: First-Ready, First-Come & First Serve Open-page policy Close-page policy
27
ECE 411: Computer Organization & Design 27 Disk Storage
28
ECE 411: Computer Organization & Design 28 Disk Organization Platters A track A sector A cylinder (1 to 12) (5000 to 30000) (100 to 500) 512 Bytes 3600 to 15000 RPM
29
ECE 411: Computer Organization & Design 29 Disk Organization Read/write Head (10s of nanometers above magnetic surface) Arm
30
ECE 411: Computer Organization & Design 30 Disk Access Time Seek time Move the arm to the desired track 5ms to 12ms Rotation latency (or delay) For example, average rotation latency for a 10,000 RPM disk is 3ms (=0.5/(10,000/60)) Data transfer latency (or throughput) Some tens of hundreds of MB per second E.g., Seagate Cheetah 15K.6 sustained 164MB/sec Disk controller overhead Use Disk cache (or cache buffer) to exploit locality 4 to 32MB today Come with the embedded controller in the HDD
31
ECE 411: Computer Organization & Design 31 Reliability, Availability, Dependability Program faults
32
ECE 411: Computer Organization & Design 32 Reliability, Availability, Dependability Program faults Static Permanent faults Design flaw FDIV ~500 million$ Manufacturing Stuck-at-faults Process variability Dynamic faults Soft errors Noise-induced Wear-out
33
ECE 411: Computer Organization & Design 33 Solution Space DRAM / SRAM Use ECC (SECDED) Disks Use redundancy User’s backup Disk arrays
34
ECE 411: Computer Organization & Design 34 RAID Reliability and Performance consideration Redundant Array of Inexpensive Disks Combine multiple small, inexpensive disk drives Break arrays into “reliability groups” Data are divided and replicated across multiple disk drives RAID-0 to RAID-5 Hardware RAID Dedicated HW controller Software RAID Implemented in the OS
35
ECE 411: Computer Organization & Design 35 Basic Principles Data mirroring Data striping Error correction code
36
ECE 411: Computer Organization & Design 36 RAID-1 Mirrored disks Most expensive (100% overhead) Every write to disk also writes to the check disk Can improve read/seek performance with sufficient number of controllers A4 A3 A2 A1 A0 A4 A3 A2 A1 A0 Disk 0 (Data Disk) Disk 1 (Check Disk)
37
ECE 411: Computer Organization & Design 37 RAID-10 Combine data striping atop of RAID-1 B5 B2 A3 A0 B5 B2 A3 A0 Data Disk 0 Data Disk 1 C0 B3 B0 A1 Data Disk 2 C0 B3 B0 A1 Data Disk 3 B4 B1 A2 Data Disk 4 B4 B1 A2 Data Disk 5
38
ECE 411: Computer Organization & Design 38 RAID-2 Bit-interleaving striping Use Hamming Code to generate and store ECC on check disks (e.g., Hamming(7,4)) Space: 4 data disks need 3 check disks (75%), 10 data disks need 4 check disks (40% overhead), 25 data disks need 5 check disks (20%) CPU needs more compute power to generate Hamming code than parity Complex controller Not really used today! D0 C0 B0 A0 D1 C1 B1 A1 Data Disk 0 Data Disk 1 D2 C2 B2 A2 Data Disk 2 D3 C3 B3 A3 Data Disk 3 dECC0 cECC0 bECC0 aECC0 Check Disk 0 dECC1 cECC1 bECC1 aECC1 Check Disk 1 dECC2 cECC2 bECC2 aECC2 Check Disk 2
39
ECE 411: Computer Organization & Design 39 RAID-3 Byte-level striping Use XOR parity to generate and store parity code on the check disk At least 3 disks: 2 data disks + 1 check disk D0 C0 B0 A0 D1 C1 B1 A1 Data Disk 0 Data Disk 1 D2 C2 B2 A2 Data Disk 2 D3 C3 B3 A3 Data Disk 3 ECCd ECCc ECCb ECCa Check Disk 0 One Transfer Unit
40
ECE 411: Computer Organization & Design 40 RAID-4 Block-level striping Keep each individual accessed unit in one disk Do not access all disks for (small) transfers Improved parallelism Use XOR parity to generate and store parity code on the check disk Check info is calculated over a piece of each transfer unit Small read one read on one disk Small write two reads and two writes (data and check disks) New parity = (old data new data) old parity No need to read B0, C0, and D0 when read-modify-write A0 Write is the bottlenecks as all writes access the check disk
41
ECE 411: Computer Organization & Design 41 RAID-4 A3 A2 A1 A0 B3 B2 B1 B0 Data Disk 0 Data Disk 1 C3 C2 C1 C0 Data Disk 2 D3 D2 D1 D0 Data Disk 3 ECC3 ECC2 ECC1 ECC0 Check Disk 0
42
ECE 411: Computer Organization & Design 42 RAID-5 Block-level striping Distributed parity to enable write parallelism. Remove bottleneck of accessing parity Example: write “sector A” and write “sector B” can be performed simultaneously E3 D3 B3 ECC2 C3 ECC3 C2 ECC4 D2 D1 ECC0 A3 A2 A1 A0 E2 B2 B1 B0 Data Disk 0 Data Disk 1 E1 C1 C0 Data Disk 2 E0 D0 Data Disk 3 ECC1 Data Disk 4
43
ECE 411: Computer Organization & Design 43 RAID-6 Similar to RAID-5 with “dual distributed parity” ECC_p = XOR(A0, B0, C0); ECC_q = Code(A0, B0, C0, ECC_p) Sustain 2 drive failures with no data loss Minimum requirement: 4 disks 2 for data striping 2 for dual parity ECC4q D2 D1 D0 E2 B2 A2 ECC4p ECC3p ECC3q C2 A1 ECC2p ECC1q A0 E1 ECC2q B1 B0 Data Disk 0 Data Disk 1 E0 C1 C0 Data Disk 2 ECC1p ECC0p Data Disk 3 ECC0q Data Disk 4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.