Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 1 / 46 Memory Hierarchy Principle of Locality ●Temporal Locality (Locality in Time) ●Spatial Locality (Locality in Space) Speed & Size TechnologyAccess TimeRelative Cost/GB SRAM0.5 ns10,000 DRAM50 ns100 Magnetic Disk5,000,000 ns1
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 2 / 46 Memory Hierarchy CPU Cache Main Memory Magnetic Disks
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 3 / 46 Cache Memory High Speed (Towards CPU) ●Conceals Slow Memory Small Size (Low Cost) CPU Cache (Fast) Cache Main Memory (Slow) Mem Hit Miss 95% hit ratio Access = Cache + Mem
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 4 / 46 Cache Memory CPU – Main Memory Address ●Cache Size < Main Memory Size CPU Cache 1 MB Main Memory 4 GB 32-bit Address Only 20 bits !!!
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 5 / 46 Cache Memory Cache Main Memory FFFFFFF FFFFF Address Mapping !!!
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 6 / 46 Associative Memory Cache FFFFF Main Memory FFFFFFF Address (Key) Data Cache Location
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 7 / 46 Associative Memory Cache Bits (Key) 8 Bits (Data) 4 8 Data Address Can have any number of locations
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 8 / 46 Associative Memory Cache Bits (Key) 8 Bits (Data) Address = ? = ? = ? How many comparators?
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 9 / 46 Associative Memory Cache Bits (Key) 8 Bits (Data) Address = ? = ? = ? Valid Bit
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 10 / 46 Associative Memory Cache Bits (Key) 32 Bits (Data) 4 8 Data 32 Bits Address
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 11 / 46 Direct Mapping Cache Cache C Address 000 Tag 1 6 Data Compare Match No match 12 Bits (Tag) 8 Bits (Data) FFFFF 20 Bits (Index) What happens when Address =
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 12 / 46 Cache Direct Mapping Cache Address 12 Bits (Tag) 32 Bits (Data) FFFF 18 Bits (Index) Tag Compare Match No match Select 4 8 Data
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 13 / 46 Cache 2-Way Set Associative Set Associative Cache C Address 000 Tag 1 6 Data Compare No match 12 Bits (Tag) 8 Bits (Data) FFFFF 20 Bits (ndex) Match Tag Data Compare
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 14 / 46 Cache Size Example: Number of Blocks = 4 K Block Size = 4 Words Word Size = 32 bits Address Size = 32 bits Tag Bits (Direct Mapping Cache) = Tag Bits (2-Way Set Associative) = Tag Bits (4-Way Set Associative) = Tag Bits (Associative Cache) =
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 15 / 46 Block Size Increasing Block Size ●Utilizes Spatial Locality ●Reduces the Number of Blocks
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 16 / 46 Cache Performance Example: CPU CPI = 2 clocks/instruction Loads & Stores instructions = 36% Instruction Cache = 2% miss rate Data Cache = 4% miss rate Memory Miss Penalty = 100 clocks Instructions Penalties = Data Penalties = CPI (with penalties) = Perfect Cache Speedup =
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 17 / 46 Cache Performance Average Memory Access Time (AMAT) ●AMAT = Time for a Hit + Miss Rate × Miss Penalty Example: Clock Cycle = 1 ns Cache Access Time (Hit) = 1 ns Cache Miss Penalty = 20 Clocks Miss Rate = 5% AMAT =
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 18 / 46 Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Direct Mapping) CPU Reference Miss Cache 0 Tag 1 2 3
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 19 / 46 Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 2 blocks (2-Way Set Associative) CPU Reference Cache 0 Tag Miss HitMiss
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 20 / 46 Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Associative) CPU Reference Cache Miss HitMissHit
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 21 / 46 Instruction Cache Cache Miss ●Send original PC value to memory ●Perform a read operation ●Wait for the cache to receive the instruction ●Restart instruction execution
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 22 / 46 Data Cache Writes Write-Through ●Consistent Copies ●Slow CPU Cache Mem
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 23 / 46 Data Cache Writes Write-Through ●Consistent Copies ●Slow Example: CPI without miss = 1 Memory delays = 100 clocks 10% of memory references are writes Overall CPI =
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 24 / 46 Data Cache Writes Write-Through with Write Buffer ●Buffer size ●Fill-Rate and Mem-Rate (Possible Stall) CPU Cache Mem
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 25 / 46 Data Cache Writes Write-Back ●Fast ●Complex & inconsistent copies CPU Cache Mem Block Replacement
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 26 / 46 Data Cache Writes CPU Cache Mem MissBlock Replacement Write-Back ●Fast ●Complex & inconsistent copies
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 27 / 46 Data Cache Writes CPU Cache Mem MissBlock Replacement Write-Back with Buffer ●Reduces the “Miss” penalty by 50%
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 28 / 46 Cache Replacement Policies First In First Out (FIFO) ●Simple ●May replace a block which is used more, leading to a miss Least Recently Used (LRU) ●More complex ●Better Hit Rate
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 29 / 46 Multilevel Cache Example: CPU CPI = 4 GHz 0.25 ns Clock Primary Cache Miss Rate = 2% Memory Access Time = 100 ns 400 Clocks CPI (Single Level Cache) = Total Miss Rate = 0.5% Secondary Cache Access Time = 5 ns 20 Clocks CPI (2-Level Cache) =
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 30 / 46 Main Memory Latency & Bandwidth ●Address (Selection of row & column) ●Data Transfer (Number of bits) Example: Send Address = 1 clock Memory Access = 15 clocks Transfer a 32-bit Word = 1 clock Cache Block = 4 words Cache Miss Memory Bandwidth =
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 31 / 46 Main Memory CPU Cache Mem 1 Word CPU Cache Mem 1 Word2 Words CPU Cache Mem 1 Word Mem Example: Simple Design Wide Bus Interleaved
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 32 / 46 DRAM Technology YearChip Size$ per GB Total Access Time Column Access Time Kbit$ 1,500, ns150 ns Kbit$ 500, ns100 ns Mbit$ 200, ns40 ns Mbit$ 50, ns40 ns Mbit$ 15,00090 ns30 ns Mbit$ 10,00060 ns12 ns Mbit$ 4,00060 ns10 ns Mbit$ 1,00055 ns7 ns Mbit$ ns5 ns Gbit$ 5040 ns1.25 ns
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 33 / 46 Virtual Memory Allow Efficient & Safe Sharing of Memory ●Memory Protection ●Program Relocatability Remove Programming Burdens of Small Memory ●Much Larger Memory Space ●Reuse Physical Memory
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 34 / 46 Offset Virtual Memory Segmentation Segments ●Variable Size ●Two-Part Address Segment 0 Segment Frame 0 Frame 1 Segment 1 Segment 0 Segment Number 31 0 ?? 0 Segment # Offset 23 0 Translation
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 35 / 46 Virtual Memory Paging Virtual Memory ●Pages ●Stored on Disk ●Virtual Address Physical Memory ●Frames ●Stored in RAM ●Physical Address Page Faults Page 0 Page Frame 0 Frame 1 Page 1 Page 0
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 36 / 46 Physical Address Virtual Address Virtual Memory Paging Address Translation Page 0 Page Frame 0 Frame 1 Page 1 Page 0 Virtual Page Number Page Offset Physical Page # Page Offset Translation
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 37 / 46 Paging Table Page Table ●Virtual to Physical Page Number Translation ●Stored in RAM ●Page Table Register Valid Virtual Page Physical Page Page Table Register
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 38 / 46 Paging Table Page Faults ●Swap Space ♦ Reserved space for full virtual memory space for a process ♦ Stored on Disk ●Page Table ●LRU Replacement Scheme Page Table ValidPointer 1 ● 0 ● 1 ● 0 ● 1 ● 0 ● 0 ● Virtual Page Number Physical Memory Disk Storage
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 39 / 46 Paging Table Page Table Size Example: Virtual Address: 32 bits Page Size: 4 KB Page Table: 4 Bytes/Entry Number of Pages = Page Table Size = Page Table ValidPointer 1 ● 0 ● 1 ● 0 ● 1 ● 0 ● 0 ● Virtual Page Number Physical Memory Disk Storage
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 40 / 46 Translation-Lookaside Buffer (TLB) Address Translation Cache Physical Memory Disk Storage TLB ValidDirtyRefTagPhysical Page 101 ● 111 ● 000 ● 101 ● 000 ● Virtual Page Number Page Table ValidDirtyRefPhysical Page 101 ● 111 ● 000 ● 101 ● 000 ● 100 ●
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 41 / 46 Virtual Memory Misses TLB Miss Page Fault Cache Miss Virtual Address TLB Page Fault Page Table Update TLB Miss Hit Cache Hit Miss
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 42 / 46 Memory Hierarchy Misses Compulsory Miss Capacity Miss Conflict Miss DesignMiss RatePerformance Increase Cache Size Decrease Capacity Misses May Increase Access Time Increase Associativity Decrease Conflict Misses May Increase Access Time Increase Block Size Decrease Compulsory Misses Increases Miss Penalty
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 43 / 46 Parallelism & Cache Coherence Coherence ●What values can be returned by a read Consistency ●When a written value will be returned by a read Main Memory CacheCache ProcessorProcessor CacheCache ProcessorProcessor 0 0 1
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 44 / 46 Cache Coherence Enforcement Migration (of Data to Local Caches) ●Reduces latency & bandwidth for shared memory. Replication (of Read-shared Data) ●Reduces latency & contention for access
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 45 / 46 Cache Coherence Protocol Snooping ●Each cache monitors bus reads/writes. ●Processors exchange full blocks. ●Large block sizes may lead to false sharing. Main Memory CacheCache ProcessorProcessor CacheCache ProcessorProcessor Invalidate 0 1
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 46 / 46 Cache Coherence Protocol Directory-based protocols ●Caches and memory record sharing status of blocks in a directory.
Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. Chapter 5