Download presentation
Presentation is loading. Please wait.
Published byBryan Dickerson Modified over 9 years ago
1
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
2
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 1 / 46 Memory Hierarchy Principle of Locality ●Temporal Locality (Locality in Time) ●Spatial Locality (Locality in Space) Speed & Size TechnologyAccess TimeRelative Cost/GB SRAM0.5 ns10,000 DRAM50 ns100 Magnetic Disk5,000,000 ns1
3
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 2 / 46 Memory Hierarchy CPU Cache Main Memory Magnetic Disks
4
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 3 / 46 Cache Memory High Speed (Towards CPU) ●Conceals Slow Memory Small Size (Low Cost) CPU Cache (Fast) Cache Main Memory (Slow) Mem Hit Miss 95% hit ratio Access = Cache + Mem
5
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 4 / 46 Cache Memory CPU – Main Memory Address ●Cache Size < Main Memory Size CPU Cache 1 MB Main Memory 4 GB 32-bit Address Only 20 bits !!!
6
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 5 / 46 Cache Memory Cache Main Memory 00000000 00000001 3FFFFFFF 00000 00001 FFFFF Address Mapping !!!
7
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 6 / 46 Associative Memory Cache 00000 00001 FFFFF Main Memory 00000000 00000001 00012000 08000000 15000000 3FFFFFFF 00012000 15000000 08000000 Address (Key) Data Cache Location
8
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 7 / 46 Associative Memory Cache 4 8 1 3 6 3 00012000 15000000 08000000 00012000 32 Bits (Key) 8 Bits (Data) 4 8 Data Address Can have any number of locations
9
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 8 / 46 Associative Memory Cache 4 8 1 3 6 3 00012000 15000000 08000000 00012000 32 Bits (Key) 8 Bits (Data) Address = ? = ? = ? How many comparators?
10
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 9 / 46 Associative Memory Cache 4 8 1 3 6 3 00012000 15000000 08000000 00012000 32 Bits (Key) 8 Bits (Data) Address = ? = ? = ? 1 1 1 Valid Bit
11
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 10 / 46 Associative Memory Cache 4 8 5 4 1 7 6 2 1 3 7 6 2 4 6 8 6 3 4 4 8 2 1 9 0 0000 1000 10 0000 0000 0000 0000 0000 0000 1000 1000 30 Bits (Key) 32 Bits (Data) 4 8 Data 32 Bits Address 4 8 5 4 1 7 6 2
12
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 11 / 46 Direct Mapping Cache Cache 1 6 7 C 0 5 000 150 080 00040 04000 00800 000 00040 Address 000 Tag 1 6 Data Compare Match No match 12 Bits (Tag) 8 Bits (Data) 00000 FFFFF 20 Bits (Index) What happens when Address = 100 00040
13
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 12 / 46 Cache Direct Mapping Cache 0 0 0 1 5 0 0 8 0 00010 01000 00200 Address 12 Bits (Tag) 32 Bits (Data) 00000 3FFFF 18 Bits (Index) 0000 0000 0000 0000 0000 0000 0100 0000 4 8 5 4 1 7 6 2 1 3 7 6 2 4 6 8 6 3 4 4 8 2 1 9 000 Tag 4 8 5 4 1 7 6 2 Compare Match No match Select 4 8 Data
14
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 13 / 46 Cache 2-Way Set Associative Set Associative Cache 1 6 7 C 0 5 000 150 080 00040 04000 00800 000 00040 Address 000 Tag 1 6 Data Compare No match 12 Bits (Tag) 8 Bits (Data) 00000 FFFFF 20 Bits (ndex) 4 9 3 1 2 0 030 090 070 03049 Match Tag Data Compare
15
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 14 / 46 Cache Size Example: Number of Blocks = 4 K Block Size = 4 Words Word Size = 32 bits Address Size = 32 bits Tag Bits (Direct Mapping Cache) = Tag Bits (2-Way Set Associative) = Tag Bits (4-Way Set Associative) = Tag Bits (Associative Cache) =
16
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 15 / 46 Block Size Increasing Block Size ●Utilizes Spatial Locality ●Reduces the Number of Blocks
17
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 16 / 46 Cache Performance Example: CPU CPI = 2 clocks/instruction Loads & Stores instructions = 36% Instruction Cache = 2% miss rate Data Cache = 4% miss rate Memory Miss Penalty = 100 clocks Instructions Penalties = Data Penalties = CPI (with penalties) = Perfect Cache Speedup =
18
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 17 / 46 Cache Performance Average Memory Access Time (AMAT) ●AMAT = Time for a Hit + Miss Rate × Miss Penalty Example: Clock Cycle = 1 ns Cache Access Time (Hit) = 1 ns Cache Miss Penalty = 20 Clocks Miss Rate = 5% AMAT =
19
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 18 / 46 Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Direct Mapping) CPU Reference 0 80 68 Miss Cache 0 Tag 1 2 3
20
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 19 / 46 Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 2 blocks (2-Way Set Associative) CPU Reference Cache 0 Tag 1 0 80 68 Miss HitMiss
21
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 20 / 46 Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Associative) CPU Reference Cache 0 80 68 Miss HitMissHit
22
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 21 / 46 Instruction Cache Cache Miss ●Send original PC value to memory ●Perform a read operation ●Wait for the cache to receive the instruction ●Restart instruction execution
23
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 22 / 46 Data Cache Writes Write-Through ●Consistent Copies ●Slow CPU Cache Mem
24
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 23 / 46 Data Cache Writes Write-Through ●Consistent Copies ●Slow Example: CPI without miss = 1 Memory delays = 100 clocks 10% of memory references are writes Overall CPI =
25
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 24 / 46 Data Cache Writes Write-Through with Write Buffer ●Buffer size ●Fill-Rate and Mem-Rate (Possible Stall) CPU Cache Mem
26
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 25 / 46 Data Cache Writes Write-Back ●Fast ●Complex & inconsistent copies CPU Cache Mem Block Replacement
27
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 26 / 46 Data Cache Writes CPU Cache Mem MissBlock Replacement Write-Back ●Fast ●Complex & inconsistent copies
28
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 27 / 46 Data Cache Writes CPU Cache Mem MissBlock Replacement Write-Back with Buffer ●Reduces the “Miss” penalty by 50%
29
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 28 / 46 Cache Replacement Policies First In First Out (FIFO) ●Simple ●May replace a block which is used more, leading to a miss Least Recently Used (LRU) ●More complex ●Better Hit Rate
30
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 29 / 46 Multilevel Cache Example: CPU CPI = 1 @ 4 GHz 0.25 ns Clock Primary Cache Miss Rate = 2% Memory Access Time = 100 ns 400 Clocks CPI (Single Level Cache) = Total Miss Rate = 0.5% Secondary Cache Access Time = 5 ns 20 Clocks CPI (2-Level Cache) =
31
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 30 / 46 Main Memory Latency & Bandwidth ●Address (Selection of row & column) ●Data Transfer (Number of bits) Example: Send Address = 1 clock Memory Access = 15 clocks Transfer a 32-bit Word = 1 clock Cache Block = 4 words Cache Miss Memory Bandwidth =
32
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 31 / 46 Main Memory CPU Cache Mem 1 Word CPU Cache Mem 1 Word2 Words CPU Cache Mem 1 Word Mem Example: Simple Design Wide Bus Interleaved
33
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 32 / 46 DRAM Technology YearChip Size$ per GB Total Access Time Column Access Time 198064 Kbit$ 1,500,000250 ns150 ns 1983256 Kbit$ 500,000185 ns100 ns 19851 Mbit$ 200,000135 ns40 ns 19894 Mbit$ 50,000110 ns40 ns 199216 Mbit$ 15,00090 ns30 ns 199664 Mbit$ 10,00060 ns12 ns 1998128 Mbit$ 4,00060 ns10 ns 2000256 Mbit$ 1,00055 ns7 ns 2004512 Mbit$ 25050 ns5 ns 20071 Gbit$ 5040 ns1.25 ns
34
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 33 / 46 Virtual Memory Allow Efficient & Safe Sharing of Memory ●Memory Protection ●Program Relocatability Remove Programming Burdens of Small Memory ●Much Larger Memory Space ●Reuse Physical Memory
35
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 34 / 46 Offset Virtual Memory Segmentation Segments ●Variable Size ●Two-Part Address 0 000 0 001 Segment 0 Segment 1 0 00 0 01 Frame 0 Frame 1 Segment 1 Segment 0 Segment Number 31 0 ?? 0 Segment # Offset 23 0 Translation
36
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 35 / 46 Virtual Memory Paging Virtual Memory ●Pages ●Stored on Disk ●Virtual Address Physical Memory ●Frames ●Stored in RAM ●Physical Address Page Faults 0 000 0 001 Page 0 Page 1 0 00 0 01 Frame 0 Frame 1 Page 1 Page 0
37
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 36 / 46 Physical Address Virtual Address Virtual Memory Paging Address Translation 0 000 0 001 Page 0 Page 1 0 00 0 01 Frame 0 Frame 1 Page 1 Page 0 Virtual Page Number Page Offset 31 12 11 0 Physical Page # Page Offset 23 12 11 0 Translation
38
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 37 / 46 Paging Table Page Table ●Virtual to Physical Page Number Translation ●Stored in RAM ●Page Table Register Valid Virtual Page Physical Page 152 0349 1125 Page Table Register
39
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 38 / 46 Paging Table Page Faults ●Swap Space ♦ Reserved space for full virtual memory space for a process ♦ Stored on Disk ●Page Table ●LRU Replacement Scheme Page Table ValidPointer 1 ● 0 ● 1 ● 0 ● 1 ● 0 ● 0 ● 0 1 2 Virtual Page Number Physical Memory Disk Storage
40
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 39 / 46 Paging Table Page Table Size Example: Virtual Address: 32 bits Page Size: 4 KB Page Table: 4 Bytes/Entry Number of Pages = Page Table Size = Page Table ValidPointer 1 ● 0 ● 1 ● 0 ● 1 ● 0 ● 0 ● 0 1 2 Virtual Page Number Physical Memory Disk Storage
41
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 40 / 46 Translation-Lookaside Buffer (TLB) Address Translation Cache Physical Memory Disk Storage TLB ValidDirtyRefTagPhysical Page 101 ● 111 ● 000 ● 101 ● 000 ● Virtual Page Number Page Table ValidDirtyRefPhysical Page 101 ● 111 ● 000 ● 101 ● 000 ● 100 ●
42
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 41 / 46 Virtual Memory Misses TLB Miss Page Fault Cache Miss Virtual Address TLB Page Fault Page Table Update TLB Miss Hit Cache Hit Miss
43
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 42 / 46 Memory Hierarchy Misses Compulsory Miss Capacity Miss Conflict Miss DesignMiss RatePerformance Increase Cache Size Decrease Capacity Misses May Increase Access Time Increase Associativity Decrease Conflict Misses May Increase Access Time Increase Block Size Decrease Compulsory Misses Increases Miss Penalty
44
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 43 / 46 Parallelism & Cache Coherence Coherence ●What values can be returned by a read Consistency ●When a written value will be returned by a read Main Memory CacheCache ProcessorProcessor CacheCache ProcessorProcessor 0 0 1
45
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 44 / 46 Cache Coherence Enforcement Migration (of Data to Local Caches) ●Reduces latency & bandwidth for shared memory. Replication (of Read-shared Data) ●Reduces latency & contention for access
46
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 45 / 46 Cache Coherence Protocol Snooping ●Each cache monitors bus reads/writes. ●Processors exchange full blocks. ●Large block sizes may lead to false sharing. Main Memory CacheCache ProcessorProcessor CacheCache ProcessorProcessor 0 0 1 Invalidate 0 1
47
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. 46 / 46 Cache Coherence Protocol Directory-based protocols ●Caches and memory record sharing status of blocks in a directory.
48
Princess Sumaya University 22540 – Computer Arch. & Org (2) Computer Engineering Dept. Chapter 5
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.