ECE 411: Computer Organization & Design 1 ECE 411 DRAM & Storage Acknowledgement: Many slides were adapted from Prof. Hsien-Hsin Lee’s ECE4100/6100 Advanced.

Slides:



Advertisements
Similar presentations
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
Advertisements

RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 6:
Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
Computer Organization and Architecture
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals Instructor: Jason D. Bakos.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
1 Storage (cont’d) Disk scheduling Reducing seek time (cont’d) Reducing rotational latency RAIDs.
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
1 Lecture 27: Disks, Reliability, SSDs, Processors Topics: HDDs, SSDs, RAID, Intel and IBM case studies Final exam stats:  Highest 91, 18 scores of 82+
Memory Technology “Non-so-random” Access Technology:
Storage & Peripherals Disks, Networks, and Other Devices.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Redundant Array of Independent Disks
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
ECE 4100/6100 Advanced Computer Architecture Lecture 11 DRAM and Storage Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
Main Memory CS448.
CPEN Digital System Design
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
August 1, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 9: I/O Devices and Communication Buses * Jeremy R. Johnson Wednesday,
Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.
Computer Architecture Lecture 24 Fasih ur Rehman.
Semiconductor Memory Types
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
Gunjeet Kaur Dronacharya Group of Institutions. Outline I Random-Access Memory Memory Decoding Error Detection and Correction Read-Only Memory Programmable.
Computer Architecture Chapter (5): Internal Memory
Types of RAM (Random Access Memory) Information Technology.
W4118 Operating Systems Instructor: Junfeng Yang.
LECTURE 13 I/O. I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
CS 1251 Computer Organization N.Sundararajan
William Stallings Computer Organization and Architecture 7th Edition
ECE 4100/6100 Advanced Computer Architecture Lecture 11 DRAM
CSE 502: Computer Architecture
Types of RAM (Random Access Memory)
Disks and RAID.
William Stallings Computer Organization and Architecture 7th Edition
Samira Khan University of Virginia Oct 9, 2017
Lecture 15: DRAM Main Memory Systems
William Stallings Computer Organization and Architecture 8th Edition
The Main Memory system: DRAM organization
Computer Architecture
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
BIC 10503: COMPUTER ARCHITECTURE
William Stallings Computer Organization and Architecture 8th Edition
Presentation transcript:

ECE 411: Computer Organization & Design 1 ECE 411 DRAM & Storage Acknowledgement: Many slides were adapted from Prof. Hsien-Hsin Lee’s ECE4100/6100 Advanced Computer Architecture course at Georgia Inst. of Tech. with his generous permission.

ECE 411: Computer Organization & Design 2 The DRAM Cell Why DRAMs  Higher density than SRAMs Disadvantages  Longer access times  Leaky, needs to be refreshed  Cannot be easily integrated with CMOS Stack capacitor (vs. Trench capacitor) Source: Memory Arch Course, Insa. Toulouse Word Line (Control) Storage Capacitor Bit Line (Information) 1T1C DRAM cell

ECE 411: Computer Organization & Design 3 One DRAM Bank

ECE 411: Computer Organization & Design 4 Column decoder Row decoder Example: 512Mb 4-bank DRAM (x4) Sense amps I/O gating Row decoder Column decoder Data out D[3:0] Address A[13:0] A[10:0] Address Multiplexing 16K 2k A x4 DRAM chip A DRAM page = 2kx4 = 1KB BA[1:0] Bank x 2048 x 4

ECE 411: Computer Organization & Design 5 DRAM Cell Array Wordline0Wordline1Wordline2Wordline1023 bitline0 bitline1 bitline2 bitline15 Wordline3

ECE 411: Computer Organization & Design 6 DRAM Sensing (Open Bitline Array) WL0WL1WL2WL127 A DRAM Subarry WL128WL129WL130WL255 A DRAM Subarry SenseAmp

ECE 411: Computer Organization & Design 7 Basic DRAM Operations SenseAmp Vdd/2 WL BL Vdd/2 Vdd Write ‘1’ driver Vdd - Vth SenseAmp Vdd/2 WL BL Precharge to Vdd/2 Vdd/2 + V signal Read ‘1’ Vdd - Vth Cm C BL Amplified V signal refresh

ECE 411: Computer Organization & Design 8 DRAM Basics Address multiplexing  Send row address when RAS asserted  Send column address when CAS asserted DRAM reads are self-destructive  Rewrite after a read Memory array  All bits within an array work in unison Memory bank  Different banks can operate independently DRAM rank  Chips inside the same rank are accessed simultaneously

ECE 411: Computer Organization & Design 9 Examples of DRAM DIMM Standards D0D7 x8 D8 D15 x8 D16D23 x8 D24D31 x8 D32D39 x8 D40 D47 x8 D48D55 x8 D56 D63 x8 x64 (No ECC) D0D7 x8 D8 D15 x8 CB0 CB7 x8 D16D23 x8 D24D31 x8 D32 D39 x8 D40 D47 x8 D48D55 x8 X72 (ECC) D56 D63 x8

ECE 411: Computer Organization & Design 10 DRAM Ranks x8 D0D7D8 D15D16D23D24D31D32D39D40 D47D48D55D56 D63 CS1 CS0 Memory Controller Rank0 Rank1

ECE 411: Computer Organization & Design 11 DRAM Ranks Single Rank 8b 64b Single Rank 4b 64b 4b Dual- Rank 8b 64b 8b

ECE 411: Computer Organization & Design 12 DRAM Organization Source: Memory Systems Architecture Course, B. Jacobs, Maryland

ECE 411: Computer Organization & Design 13 Organization of DRAM Modules Source: Memory Systems Architecture Course Bruce Jacobs, University of Maryland Memory Controller Addr and Cmd Bus Data Bus Channel Multi-Banked DRAM Chip

ECE 411: Computer Organization & Design 14 DRAM Configuration Example Source: MICRON DDR3 DRAM

ECE 411: Computer Organization & Design 15 Memory Controller DRAM Module Addr Bus WE CAS RAS Assert RAS Row Address Row Opened Data Bus Column Address Assert CAS DRAM Access (Non Nibble Mode) RAS CAS ADDR DATA Row Addr Col Addr Data Col Addr Data

ECE 411: Computer Organization & Design 16 DRAM Refresh Leaky storage Periodic Refresh across DRAM rows Un-accessible when refreshing Read, and write the same data back Example:  4k rows in a DRAM  100ns read cycle  Decay in 64ms  4096*100ns = 410  s to refresh once  410  s / 64ms = 0.64% unavailability

ECE 411: Computer Organization & Design 17 DRAM Refresh Styles Bursty 64ms 410  s =(100ns*4096) 410  s 64ms Distributed 64ms 15.6  s 64ms 100ns

ECE 411: Computer Organization & Design 18 Types of DRAM and DIMMs Synchronous DRAM: Adds predictability to DRAM operation  DDRx: Transfer data on both edges of the clock  GDDRx: Graphics  HBM: Graphic and HPC  HMC: HPC DIMMs  FB-DIMM: DIMMs connected using point to point connection instead of bus. Allows more DIMMs to be incorporated in server based systems  R-DIMM: Buffers only address/command bus  LR-DIMM: Buffer both address/command and data buses

ECE 411: Computer Organization & Design 19 Memory Module Types UDIMM RDIMM DDR3 LRDIMM DDR4 LRDIMM

ECE 411: Computer Organization & Design 20 Memory: Fundamental Trade-offs overcoming capacity & bandwidth limits of DDRx DIMMs  capacity using LR-DIMMs or NVM-base DIMMs increased latency  bandwidth using HBM or HMC limited capacity and/or increased latency latency capacity HBM/HMC DDRx bandwidth 3DXPoint

ECE 411: Computer Organization & Design 21 × 7 % area overhead Latency/Capacity Trade-off larger capacity (higher density) – longer latency example 1: DRAM  larger mat size  more cells sharing peripherals higher density × longer row cycle time (tRC) 37 % tRC reduction

ECE 411: Computer Organization & Design 22 Latency/Capacity Trade-off larger capacity (higher density) – longer latency example 1: DRAM example 2: Phase-Change Memory (PCM)  a single physical cell can store one or multiple bits  single-level cell (SLC) or multi-level cell (MLC)  MLC PCM, higher density, longer latency

ECE 411: Computer Organization & Design 23 Bandwidth Scalability w/ Parallel Interface go wider ?  more pins required on processor side 130+ pins/channel pin on package not scale  more difficulty in electrical wire length matching go faster ?  cross-talk & attenuation  bit-wise synchronization to clock signal clock-to-data & data-to-data skews high power cost  I/O power = Energy per bit transfer (pJ/bit) × Bandwidth (Gbps)

ECE 411: Computer Organization & Design 24 Memory I/O Interface Power energy efficiency (pJ/bit) improving  driven by architecture, circuit, and process improvements  I/O power kept at a low level but in future …  super-linear degradation in energy efficiency  over-extending channel speed requires power-hungry I/O circuitry  100 GB/s × 20 pJ/bit = 16 W ! Source: Intel

ECE 411: Computer Organization & Design 25 High-speed Serial Interface (HSI) high bit rate  up 20 Gbps/pin vs 1.6 Gbps/pin (DDR3-1600)  fewer pins needed for same bandwidth more channels allowed high energy efficiency (low pJ/bit)  as low as sub 1 15~20 Gbps  low package power; low energy consumption  better scalability, avoid super-linear degradation drawbacks  longer latency  higher static power/energy consumption

ECE 411: Computer Organization & Design 26 Memory Scheduling FR-FCFS: First-Ready, First-Come & First Serve  Open-page policy  Close-page policy

ECE 411: Computer Organization & Design 27 Disk Storage

ECE 411: Computer Organization & Design 28 Disk Organization Platters A track A sector A cylinder (1 to 12) (5000 to 30000) (100 to 500) 512 Bytes 3600 to RPM

ECE 411: Computer Organization & Design 29 Disk Organization Read/write Head (10s of nanometers above magnetic surface) Arm

ECE 411: Computer Organization & Design 30 Disk Access Time Seek time  Move the arm to the desired track  5ms to 12ms Rotation latency (or delay)  For example, average rotation latency for a 10,000 RPM disk is 3ms (=0.5/(10,000/60)) Data transfer latency (or throughput)  Some tens of hundreds of MB per second  E.g., Seagate Cheetah 15K.6 sustained 164MB/sec Disk controller overhead Use Disk cache (or cache buffer) to exploit locality  4 to 32MB today  Come with the embedded controller in the HDD

ECE 411: Computer Organization & Design 31 Reliability, Availability, Dependability Program faults

ECE 411: Computer Organization & Design 32 Reliability, Availability, Dependability Program faults  Static Permanent faults  Design flaw  FDIV ~500 million$  Manufacturing  Stuck-at-faults  Process variability Dynamic faults  Soft errors  Noise-induced  Wear-out

ECE 411: Computer Organization & Design 33 Solution Space DRAM / SRAM  Use ECC (SECDED) Disks  Use redundancy User’s backup Disk arrays

ECE 411: Computer Organization & Design 34 RAID Reliability and Performance consideration Redundant Array of Inexpensive Disks Combine multiple small, inexpensive disk drives Break arrays into “reliability groups” Data are divided and replicated across multiple disk drives RAID-0 to RAID-5 Hardware RAID  Dedicated HW controller Software RAID  Implemented in the OS

ECE 411: Computer Organization & Design 35 Basic Principles Data mirroring Data striping Error correction code

ECE 411: Computer Organization & Design 36 RAID-1 Mirrored disks Most expensive (100% overhead) Every write to disk also writes to the check disk Can improve read/seek performance with sufficient number of controllers A4 A3 A2 A1 A0 A4 A3 A2 A1 A0 Disk 0 (Data Disk) Disk 1 (Check Disk)

ECE 411: Computer Organization & Design 37 RAID-10 Combine data striping atop of RAID-1 B5 B2 A3 A0 B5 B2 A3 A0 Data Disk 0 Data Disk 1 C0 B3 B0 A1 Data Disk 2 C0 B3 B0 A1 Data Disk 3 B4 B1 A2 Data Disk 4 B4 B1 A2 Data Disk 5

ECE 411: Computer Organization & Design 38 RAID-2 Bit-interleaving striping Use Hamming Code to generate and store ECC on check disks (e.g., Hamming(7,4))  Space: 4 data disks need 3 check disks (75%), 10 data disks need 4 check disks (40% overhead), 25 data disks need 5 check disks (20%)  CPU needs more compute power to generate Hamming code than parity Complex controller Not really used today! D0 C0 B0 A0 D1 C1 B1 A1 Data Disk 0 Data Disk 1 D2 C2 B2 A2 Data Disk 2 D3 C3 B3 A3 Data Disk 3 dECC0 cECC0 bECC0 aECC0 Check Disk 0 dECC1 cECC1 bECC1 aECC1 Check Disk 1 dECC2 cECC2 bECC2 aECC2 Check Disk 2

ECE 411: Computer Organization & Design 39 RAID-3 Byte-level striping Use XOR parity to generate and store parity code on the check disk At least 3 disks: 2 data disks + 1 check disk D0 C0 B0 A0 D1 C1 B1 A1 Data Disk 0 Data Disk 1 D2 C2 B2 A2 Data Disk 2 D3 C3 B3 A3 Data Disk 3 ECCd ECCc ECCb ECCa Check Disk 0 One Transfer Unit

ECE 411: Computer Organization & Design 40 RAID-4 Block-level striping Keep each individual accessed unit in one disk  Do not access all disks for (small) transfers  Improved parallelism Use XOR parity to generate and store parity code on the check disk Check info is calculated over a piece of each transfer unit Small read  one read on one disk Small write  two reads and two writes (data and check disks)  New parity = (old data  new data)  old parity  No need to read B0, C0, and D0 when read-modify-write A0 Write is the bottlenecks as all writes access the check disk

ECE 411: Computer Organization & Design 41 RAID-4 A3 A2 A1 A0 B3 B2 B1 B0 Data Disk 0 Data Disk 1 C3 C2 C1 C0 Data Disk 2 D3 D2 D1 D0 Data Disk 3 ECC3 ECC2 ECC1 ECC0 Check Disk 0

ECE 411: Computer Organization & Design 42 RAID-5 Block-level striping Distributed parity to enable write parallelism. Remove bottleneck of accessing parity Example: write “sector A” and write “sector B” can be performed simultaneously E3 D3 B3 ECC2 C3 ECC3 C2 ECC4 D2 D1 ECC0 A3 A2 A1 A0 E2 B2 B1 B0 Data Disk 0 Data Disk 1 E1 C1 C0 Data Disk 2 E0 D0 Data Disk 3 ECC1 Data Disk 4

ECE 411: Computer Organization & Design 43 RAID-6 Similar to RAID-5 with “dual distributed parity” ECC_p = XOR(A0, B0, C0); ECC_q = Code(A0, B0, C0, ECC_p) Sustain 2 drive failures with no data loss Minimum requirement: 4 disks  2 for data striping  2 for dual parity ECC4q D2 D1 D0 E2 B2 A2 ECC4p ECC3p ECC3q C2 A1 ECC2p ECC1q A0 E1 ECC2q B1 B0 Data Disk 0 Data Disk 1 E0 C1 C0 Data Disk 2 ECC1p ECC0p Data Disk 3 ECC0q Data Disk 4