Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 25 Memory Hierarchy Design (Storage Technologies Trends and Caching) Prof. Dr. M. Ashraf Chughtai
Today’s Topics Recap: Processor Performance What is outside the processor Storage Technologies RAM and Enhanced DRAM Disk Storage Summary MAC/VU-Advanced Computer Architecture2 Lec. 25 – Memory Hierarchy Design (1)
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 3 Data path and control design of processor Hardware-based and software-based techniques to expose the ILP Recap: Processor Performance
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 4 performance of a computer What is outside processor?
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 5 –what is outside the processor? –how the outside systems influence the performance of processors? Whatever is outside the processor is referred to as the I/O system The I/O systems include: –Memory System –Buses and controllers What is Outside the Processor? … Cont’d
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 6 Memory system design for high performance computers Design goal of memory system for high performance computers is to present the user with as much memory as is available Memory System: An introduction
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 7 Fast memory is expensive and cheap memory is slow, therefore a memory hierarchy is organized into several levels – cost as low as of the cheapest memory and – speed as fast as of the fastest memory Memory Hierarchy: An introduction
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 8 Memory Hierarchy System Control Datapath Memory Processor Memory Fastest Slowest Smallest Biggest Highest Lowest Speed: Size: Cost:
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 9
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 10 “The principle of locality”, where each type of the module is located in the memory system? Principle of Memory hierarchy
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 11 Advantage of the principle of locality An average access speed that is very close to the speed that is offered by the fastest technology and –Cost that is very close to the cheapest technology Principle of Locality
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 12 The semiconductor memories such as registers, Static RAM and Dynamic RAM fast in access speed but are expensive Used in small size and placed either inside or closest to the processor Secondary storage devices which offer huge storage at lowest cost per bit are placed farthest away from the processor Storage Type in Memory System
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 13 Levels of the Memory Hierarchy 100s Bytes, <1 ns/ $.1-.01/bit K Bytes, 1-10 ns $ /bit M Bytes, 10ns- 100ns $ G Bytes, 1-10ms cents Capacity, Access Time, Cost Infinite, sec C -6 Registers Cache Main Memory Disk Tape Instr. Operands Blocks Pages Files Staging Transfer Unit prog./compiler 1-8 bytes cache control bytes OS 512-4K bytes user/operator Mbytes Upper Level Lower Level faster Larger
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 14
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 15
Memory Hierarchy Pyramid registers on-chip L1 cache (SRAM) main memory (DRAM) local secondary storage (local disks) Larger, slower, and cheaper (per byte) storage devices remote secondary storage (distributed file systems, Web servers) Local disks hold files retrieved from disks on remote network servers. Main memory holds disk blocks retrieved from local disks. off-chip L2 cache (SRAM) L1 cache holds cache lines retrieved from the L2 cache memory. CPU registers hold words retrieved from L1 cache. L2 cache holds cache lines retrieved from main memory. L0: L1: L2: L3: L4: L5: Smaller, faster, and costlier (per byte) storage devices MAC/VU-Advanced Computer Architecture16 Lec. 25 – Memory Hierarchy Design (1)
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 17
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 18 Classification of memory systems based on different attributes and their design Attributes: –Material – Semiconductor, magnetic, optical –Accessing – Random, Sequential, Hybrid –Store/Retrieve – ROM, RMM and RWM Storage Systems
Random-Access Memory (RAM) Key features –RAM is packaged as a chip. –Basic storage unit is a cell (one bit per cell). –Multiple RAM chips form a memory Types of RAM –Static RAM (SRAM) –Dynamic RAM (DRAM) MAC/VU-Advanced Computer Architecture19 Lec. 25 – Memory Hierarchy Design (1)
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 20 Static RAM: Basic Cell 6-Transistor SRAM Cell bit word (row select) bit word Write: 1. Drive bit lines (bit=1, bit=0) 2.. Select row Read: 1. Pre-charge bit and bit to Vdd 2. Select row 3. Cell pulls one line low 4. Sense amp on column detects difference between bit and bit replaced with pullup to save area 10 01
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 21
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 22
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 23
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 24 SRAM Organization: 16-word x 4-bit
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 25
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 26
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 27
Dynamic Random Access Memory (DRAM ) –Each cell stores bit with a capacitor and transistor. –Value must be refreshed every ms. –Sensitive to disturbances. –Slower and cheaper than SRAM. MAC/VU-Advanced Computer Architecture28 Lec. 25 – Memory Hierarchy Design (1)
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 29 Basic DRAM Cell Write: –Drive bit line and Select row Read: –Pre-charge bit line to Vdd and Select row –Cell and bit line share charges Here, very small voltage changes occurs on the bit line therefore Sense amplifier is used to detect changes –Apply Write to restore the value Refresh –Just do a dummy read to every cell. row select bit
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 30
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 31
DRAM Organization 16 words x 8bit 16 super cells (words) of size 8 bits internal row buffer cols rows x 8 DRAM chip addr data supercell (2,1) 2 bits / 8 bits / memory controller (to CPU) MAC/VU-Advanced Computer Architecture32 Lec. 25 – Memory Hierarchy Design (1)
Reading DRAM Supercell (2,1) Step 1(a): Row access strobe (RAS) selects row 2. Step 1(b): Row 2 copied from DRAM array to row buffer. MAC/VU-Advanced Computer Architecture33 Lec. 25 – Memory Hierarchy Design (1) cols rows RAS = x 8 DRAM chip 3 addr data 2/2/ 8/8/ memory controller internal row buffer
Reading DRAM Supercell (2,1) Step 2(a): Column access strobe (CAS) selects column 1. Step 2(b): Supercell (2,1) copied from buffer to data lines, and eventually back to the CPU. internal buffer cols rows internal row buffer 16 x 8 DRAM chip CAS = 1 addr data 2/2/ 8/8/ memory controller supercell (2,1) supercell (2,1) To CPU MAC/VU-Advanced Computer Architecture34 Lec. 25 – Memory Hierarchy Design (1)
64 MB DRAM Memory Module 64 MB DRAM Memory Module [8x8MB DRAM Chips] : supercell (i,j) addr (row = i, col = j) Memory controller DRAM 7 DRAM bit double word at main memory address A bits 0-7 bits 8-15 bits bits bits bits bits bits bit doubleword MAC/VU-Advanced Computer Architecture35 Lec. 25 – Memory Hierarchy Design (1)
Enhanced DRAMs 1:Fast Page Mode DRAM (FPM DRAM) –In normal DRAM, we can only read and write M- bit at time because only one row and one column is selected at any time by the row and column address –In other words, for each M-bit memory access, we have to provide a row address followed by a column address. MAC/VU-Advanced Computer Architecture36 Lec. 25 – Memory Hierarchy Design (1)
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 37 Page Mode DRAM: Motivation Regular DRAM Organization: – –N rows x N column x M-bit – –Read & Write M-bit at a time – –Each M-bit access requires a RAS / CAS cycle Fast Page Mode DRAM – –N x M “register” to save a row ARow AddressJunk CAS_L RAS_L Col AddressRow AddressJunkCol Address 1st M-bit Access2nd M-bit Access N rows N cols DRAM M bits Row Address Column Address M-bit Output
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 38 Fast Page Mode DRAM.. Cont’d -one RAS and 3 CAS: [RAS, CAS, CAS, CAS, CAS] -instead of with 4 RAS and 4 CAS: [(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS)]
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 39 Fast Page Mode Operation Fast Page Mode DRAM –N x M “SRAM” to save a row After a row is read into the register –Only CAS is needed to access other M-bit blocks on that row –RAS_L remains asserted while CAS_L is toggled ARow Address CAS_L RAS_L Col Address 1st M-bit Access N rows N cols DRAM Column Address M-bit Output M bits N x M “SRAM” Row Address Col Address 2nd M-bit3rd M-bit4th M-bit
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 40
MAC/VU-Advanced Computer Architecture Lec. 25 – Memory Hierarchy Design (1) 41
Enhanced DRAMs 2: Extended data out DRAM (EDO DRAM) – It is the Enhanced FPM DRAM with more closely spaced CAS signals 3: Synchronous DRAM (SDRAM) – It is driven with rising clock edge instead of asynchronous control signals. MAC/VU-Advanced Computer Architecture42 Lec. 25 – Memory Hierarchy Design (1)
Enhanced DRAMs 4:Double Data Rate synchronous DRAM (DDR SDRAM) – It is Enhancement of SDRAM that uses both clock edges as control signals. 5: Video RAM (VRAM) –It is like FPM DRAM, but output is produced by shifting row buffer; and is –Dual ported to allow concurrent reads and writes MAC/VU-Advanced Computer Architecture43 Lec. 25 – Memory Hierarchy Design (1)
Nonvolatile Memories DRAM and SRAM are volatile memories –Lose information if powered off. Nonvolatile memories retain value even if powered off. –Generic name is read-only memory (ROM). –Misleading because some ROMs can be read and modified. MAC/VU-Advanced Computer Architecture44 Lec. 25 – Memory Hierarchy Design (1)
Nonvolatile Memories Types of ROMs –Programmable ROM (PROM) –Erasable programmable ROM (EPROM) –Electrically erasable PROM (EEPROM) –Flash memory Firmware –Program stored in a ROM Boot time code, BIOS (basic input/output system) graphics cards, disk controllers. MAC/VU-Advanced Computer Architecture45 Lec. 25 – Memory Hierarchy Design (1)
Disk Geometry spindle surface tracks track k sectors gaps MAC/VU-Advanced Computer Architecture46 Lec. 25 – Memory Hierarchy Design (1)
Disk Geometry (Muliple-Platter View) Aligned tracks form a cylinder. Aligned tracks form a cylinder. surface 0 surface 1 surface 2 surface 3 surface 4 surface 5 cylinder k spindle platter 0 platter 1 platter 2 MAC/VU-Advanced Computer Architecture47 Lec. 25 – Memory Hierarchy Design (1)
Disk Capacity Capacity: maximum number of bits that can be stored. –Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track. –Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment. –Arial density (bits/in2): product of recording and track density. MAC/VU-Advanced Computer Architecture48 Lec. 25 – Memory Hierarchy Design (1)
Disk Operation (Multi-Platter View) arm read/write heads move in unison from cylinder to cylinder spindle MAC/VU-Advanced Computer Architecture49 Lec. 25 – Memory Hierarchy Design (1)
Disk Access Time Average time to access some target sector approximated by : –T access = Tavg seek + Tavg rotation + Tavg transfer Seek time (Tavg seek ) –Time to position heads over cylinder containing target sector –Typical Tavg seek = 9 ms MAC/VU-Advanced Computer Architecture50 Lec. 25 – Memory Hierarchy Design (1)
Disk Access Time Rotational latency (Tavg rotation ) –Time waiting for first bit of target sector to pass under r/w head. –Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min Transfer time (Tavg transfer ) –Time to read the bits in the target sector. –Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min. MAC/VU-Advanced Computer Architecture51 Lec. 25 – Memory Hierarchy Design (1)
Disk Access Time Example Given: –Rotational rate = 7,200 RPM –Average seek time = 9 ms. –Avg # sectors/track = 400. Derived: –Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms. –Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms –Taccess = 9 ms + 4 ms ms Important points: –Access time dominated by seek time and rotational latency. –First bit in a sector is the most expensive, the rest are free. –SRAM access time is about 4 ns/doubleword, DRAM about 60 ns Disk is about 40,000 times slower than SRAM, 2,500 times slower then DRAM. MAC/VU-Advanced Computer Architecture52 Lec. 25 – Memory Hierarchy Design (1)
Logical Disk Blocks –The set of available sectors is modeled as a sequence of b-sized logical blocks (0, 1, 2,...) Mapping between logical blocks and actual (physical) sectors –Maintained by hardware/firmware device called disk controller. –Converts requests for logical blocks into (surface, track, sector) triples. MAC/VU-Advanced Computer Architecture53 Lec. 25 – Memory Hierarchy Design (1)
CPU-Memory Gap MAC/VU-Advanced Computer Architecture54 Lecture 26 Memory Hierarchy (2)
Summary Memory hierarchy organization Design of basic memory modules of DRAM and SRAM Design and working of disk storages Gap between the speed of processor and the storage devices - DRAM, SRAM and Disk is increasing with time MAC/VU-Advanced Computer Architecture55 Lec. 25 – Memory Hierarchy Design (1)
MAC/VU-Advanced Computer Architecture56 Lec. 25 – Memory Hierarchy Design (1)