Download presentation
Presentation is loading. Please wait.
Published byBridget Barton Modified over 9 years ago
1
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January 24 2008 Session 2
2
Computer Science and Engineering Copyright by Hesham El-Rewini Contents (Memory) Memory Hierarchy Cache Memory Placement Policies n Direct Mapping n Fully Associative n Set Associative Replacement Policies n FIFO, Random, Optimal, LRU, MRU Cache Write Policies
3
Computer Science and Engineering Copyright by Hesham El-Rewini Memory Hierarchy CPU Registers Cache Main Memory Secondary Storage Latency Bandwidth Speed Cost per bit
4
Computer Science and Engineering Copyright by Hesham El-Rewini Sequence of events 1.Processor makes a request for X 2.X is sought in the cache 3.If it exists hit (hit ratio h) 4.Otherwise miss (miss ratio m = 1-h) 5.If miss X is sought in main memory 6.It can be generalized to more levels
5
Computer Science and Engineering Copyright by Hesham El-Rewini Cache Memory The idea is to keep the information expected to be used more frequently in the cache. Locality of Reference n Temporal Locality n Spatial Locality Placement Policies Replacement Policies
6
Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies How to Map memory blocks (lines) to Cache block frames (line frames) Blocks (lines) Block Frames (Line Frames) Memory Cache
7
Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies n Direct Mapping n Fully Associative n Set Associative
8
Computer Science and Engineering Copyright by Hesham El-Rewini Direct Mapping Simplest A memory block is mapped to a fixed cache block frame (many to one mapping) J = I mod N n J Cache block frame number n I Memory block number n N number of cache block frames
9
Computer Science and Engineering Copyright by Hesham El-Rewini Address Format Memory M blocks Block size B words Cache N blocks Address size log 2 (M * B) TagBlock frameWord log 2 Blog 2 NRemaining bits log 2 M/N
10
Computer Science and Engineering Copyright by Hesham El-Rewini Example Memory 4K blocks Block size 16 words Address size log 2 (4K * 16) = 16 Cache 128 blocks TagBlock frameWord 475
11
Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) 128 129 255 0 1 127 3968 4095 0 1 2 127 MemoryTagcache 0131 5 bits
12
Computer Science and Engineering Copyright by Hesham El-Rewini Fully Associative Most flexible A memory block is mapped to any available cache block frame (many to many mapping) Associative Search
13
Computer Science and Engineering Copyright by Hesham El-Rewini Address Format Memory M blocks Block size B words Cache N blocks Address size log 2 (M * B) TagWord log 2 BRemaining bits log 2 M
14
Computer Science and Engineering Copyright by Hesham El-Rewini Example Memory 4K blocks Block size 16 words Address size log 2 (4K * 16) = 16 Cache 128 blocks TagWord 412
15
Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) 0 1 4094 4095 0 1 2 127 Memory Tagcache 12 bits
16
Computer Science and Engineering Copyright by Hesham El-Rewini Set Associative Compromise between the other two Cache number of sets Set number of blocks A memory block is mapped to any available cache block frame within a specific set Associative Search within a set
17
Computer Science and Engineering Copyright by Hesham El-Rewini Address Format Memory M blocks Block size B words Cache N blocks Number of sets S N/num of blocks per set Address size log 2 (M * B) log 2 B TagSetWord log 2 S Remaining bits log 2 M/S
18
Computer Science and Engineering Copyright by Hesham El-Rewini Example Memory 4K blocks Block size 16 words Address size log 2 (4K * 16) = 16 Cache 128 blocks Num of blocks per set = 4 Number of sets = 32 4 TagSetWord 57
19
Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) 0 1 2 3 126 127 Set 0 Tag cache 7 bits Set 31 32 33 63 0 1 314095 Memory 01 127 124 125
20
Computer Science and Engineering Copyright by Hesham El-Rewini Comparison Simplicity Associative Search Cache Utilization Replacement
21
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise The instruction set for your architecture has 40-bit addresses, with each addressable item being a byte. You elect to design a four-way set-associative cache with each of the four blocks in a set containing 64 bytes. Assume that you have 256 sets in the cache. Show the Format of the address
22
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) Address size = 40 Block size 64 words Num of blocks per set = 4 Number of sets = 256 Cache 256*4 blocks 6 TagSetWord 826
23
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) Consider the following sequence of addresses. (All are hex numbers) 0E1B01AA05 0E1B01AA07 0E1B2FE305 0E1B4FFD8F 0E1B01AA0E In your cache, what will be the tags in the sets(s) that contain these references at the end of the sequence? Assume that the cache is initially flushed (empty).
24
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) 0E1B01AA05 0E1B011010101000000101 0E1B01AA07 0E1B011010101000000111 0E1B2FE305 0E1B2F1110001100000101
25
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) 0E1B4FFD8F 0E1B4F1111110110001111 0E1B01AA0E 0E1B011010101000001110
26
Computer Science and Engineering Copyright by Hesham El-Rewini Replacement Techniques FIFO LRU MRU Random Optimal
27
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise Suppose that your cache can hold only three blocks and the block requests are as follows: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 Show the contents of the cache if the replacement policy is a) LRU, b) FIFO, c) Optimal
28
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) FIFO 7 0 1 2 0 1 2 0 1 2 3 1 2 3 0 4 3 0 4 2 0 4 2 3 0 2 3 0 2 3 0 2 3 0 1 3 0 1 2 0 1 2 7 1 2 7 0 2 7 0 1 0 1 2 7 0 1 MRU
29
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) OPT 7 0 1 LRU 7 0 1
30
Computer Science and Engineering Copyright by Hesham El-Rewini Cache Write Policies Cache Hit n Write Through n Write Back Cache Miss n Write-allocate n Write-no-allocate
31
Computer Science and Engineering Copyright by Hesham El-Rewini Read Policy -- Cache Miss Missed block is brought to cache – required word forwarded immediately to the CPU Missed block is entirely stored in the cache and the required word is then forwarded to the CPU
32
Computer Science and Engineering Copyright by Hesham El-Rewini Pentium IV two-level cache Cache Level 1 L1 Cache Level 2 L2 Main Memory Processor
33
Computer Science and Engineering Copyright by Hesham El-Rewini Cache L1 Cache organizationSet-Associative Block Size64 bytes Cache L1 size 8KB Number of blocks per setFour CPU AddressingByte addressable
34
Computer Science and Engineering Copyright by Hesham El-Rewini CPU and Memory Interface MAR MDR CPU 0 1 2 2 n - 1 b n lines b lines R / W Main Memory
35
Computer Science and Engineering Copyright by Hesham El-Rewini Pipelining
36
Computer Science and Engineering Copyright by Hesham El-Rewini Contents Introduction Linear Pipelines Nonlinear pipelines
37
Computer Science and Engineering Copyright by Hesham El-Rewini Basic Idea Assembly Line Divide the execution of a task among a number of stages A task is divided into subtasks to be executed in sequence Performance improvement compared to sequential execution
38
Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Task 1 2 n Sub-tasks 1 2 n Pipeline Stream of Tasks
39
Computer Science and Engineering Copyright by Hesham El-Rewini 5 Tasks on 4 stage pipeline Task 1 Task 2 Task 3 Task 4 Task 5 1 23 4 5 67 8 Time
40
Computer Science and Engineering Copyright by Hesham El-Rewini Speedup t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1
41
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline Processing Stages are linearly connected Perform fixed function Synchronous Pipeline Clocked latches between Stage i and Stage i+1 Equal delays in all stages Asynchronous Pipeline (Handshaking)
42
Computer Science and Engineering Copyright by Hesham El-Rewini Latches S1 S2 S3 L1 L2 Equal delays clock period Slowest stage determines delay
43
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table X X X X S1 S2 S3 S4 Time
44
Computer Science and Engineering Copyright by Hesham El-Rewini 5 tasks on 4 stages XXXXX XXXXX XXXXX XXXXX S1 S2 S3 S4 Time
45
Computer Science and Engineering Copyright by Hesham El-Rewini Non Linear Pipelines Variable functions Feed-Forward Feedback
46
Computer Science and Engineering Copyright by Hesham El-Rewini 3 stages & 2 functions S1 S2 S3 Y X
47
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Tables for X & Y XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3
48
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Instruction Pipelines Assume the following instruction execution phases: n Fetch (F) n Decode (D) n Operand Fetch (O) n Execute (E) n Write results (W)
49
Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Instruction Execution I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 F D E W O
50
Computer Science and Engineering Copyright by Hesham El-Rewini Dependencies nData Dependency (Operand is not ready yet) nInstruction Dependency (Branching) Will that Cause a Problem?
51
Computer Science and Engineering Copyright by Hesham El-Rewini Data Dependency I 1 -- Add R1, R2, R3 I 2 -- Sub R4, R1, R5 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 F D E W O 1 2 3 45 6
52
Computer Science and Engineering Copyright by Hesham El-Rewini Solutions STALL Forwarding Write and Read in one cycle ….
53
Computer Science and Engineering Copyright by Hesham El-Rewini Instruction Dependency I 1 – Branch o I 2 – 1 2 3 45 6 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 F D E W O 1 2 3 45 6
54
Computer Science and Engineering Copyright by Hesham El-Rewini Solutions STALL Predict Branch taken Predict Branch not taken ….
55
Computer Science and Engineering Copyright by Hesham El-Rewini Floating Point Multiplication Inputs (Mantissa 1, Exponenet 1 ), (Mantissa 2, Exponent 2 ) Add the two exponents Exponent-out Multiple the 2 mantissas Normalize mantissa and adjust exponent Round the product mantissa to a single length mantissa. You may adjust the exponent
56
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline for floating-point multiplication Add Exponents Multiply Mantissa Normalize Round Partial Products Accumulator Add Exponents Normalize Round Re normalize
57
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline for floating-point Addition Partial Shift Add Mantissa Subtract Exponents Find Leading 1 Round Re normalize Partial Shift
58
Computer Science and Engineering Copyright by Hesham El-Rewini Combined Adder and Multiplier Partial Shift Add Mantissa Exponents Subtract / ADD Find Leading 1 Round Re normalize Partial Shift Partial Products C A B ED F G H
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.