Download presentation
Presentation is loading. Please wait.
Published byΛαμία Παπάγος Modified over 6 years ago
1
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
2010 R&E Computer System Education & Research Lecture 9. MIPS Processor Design – Pipelined Processor Design #2 Prof. Taeweon Suh Computer Science Education Korea University
2
Pipelined Datapath
3
Example for lw instruction: Instruction Fetch (IF)
y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
4
Example for lw instruction: Instruction Decode (ID)
y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
5
Example for lw instruction: Execution (EX)
y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
6
Example for lw instruction: Memory (MEM)
d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
7
Example for lw instruction: Writeback (WB)
y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
8
Example for sw instruction: Memory (MEM)
d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
9
Example for sw instruction: Writeback (WB): do nothing
y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
10
Corrected Datapath (for lw)
I n s t r u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
11
Pipelining Example add $14, $5, $6 lw $13, 24($1) add $12, $3, $4
s t r u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)
12
Pipeline Control Note that in this implementation, branch instruction decides whether to branch in the MEM stage
13
Pipeline Control We have 5 stages
IF, ID, EX, MEM, WB What needs to be controlled in each stage? Instruction fetch and PC increment Instruction decode / operand fetch Execution stage RegDst ALUop[1:0] ALUSrc Memory stage Branch MemRead MemWrite Writeback MemtoReg RegWrite (note that this signal is in ID stage)
14
Pipeline Control Extend pipeline registers to include control information (created in ID) Pass control signals along just like the data
15
Datapath with Control
16
Datapath with Control IF: lw $10, 9($1) P C I n s t r u c i o m e y A
[ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f E X F / IF: lw $10, 9($1)
17
Datapath with Control IF: sub $11, $2, $3 ID: lw $10, 9($1) “lw” 11
m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E IF: sub $11, $2, $3 ID: lw $10, 9($1) 11 010 0001 “lw”
18
Datapath with Control ID: sub $11, $2, $3 EX: lw $10, 9($1)
m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 11 010 00 ID: sub $11, $2, $3 EX: lw $10, 9($1) IF: and $12, $4, $5 10 000 1100 “sub”
19
Datapath with Control EX: sub $11, $2, $3 MEM: lw $10, 9($1)
y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 EX: sub $11, $2, $3 MEM: lw $10, 9($1) ID: and $12, $4, $5 1100 IF: or $13, $6, $7 11 “and”
20
Datapath with Control MEM: sub $11, .. WB: lw $10, 9($1)
y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 MEM: sub $11, .. WB: lw $10, 9($1) EX: and $12, $4, $5 1100 ID: or $13, $6, $7 “or” IF: add $14, $8, $9
21
Datapath with Control WB: sub $11, .. MEM: and $12… EX: or $13, $6, $7
y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 WB: sub $11, .. MEM: and $12… 1100 EX: or $13, $6, $7 “add” ID: add $14, $8, $9 IF: xxxx
22
Datapath with Control WB: and $12… MEM: or $13, .. EX: add $14, $8, $9
s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X 10 000 WB: and $12… MEM: or $13, .. EX: add $14, $8, $9 IF: xxxx ID: xxxx
23
Datapath with Control MEM: add $14, .. EX: xxxx IF: xxxx ID: xxxx
s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X MEM: add $14, .. 10 EX: xxxx IF: xxxx ID: xxxx WB: or $13…
24
Datapath with Control WB: add $14.. MEM: xxxx EX: xxxx IF: xxxx
s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X WB: add $14.. MEM: xxxx EX: xxxx IF: xxxx ID: xxxx
25
Dependencies Dependencies
Problem with starting (or executing) next instruction before first is finished Dependencies incur data and control hazards
26
Data Hazard - Software Solution
Dependencies that “go backward in time” Have compiler guarantee no hazards? Insert nop (no operation) instructions (“0x ” is nop in MIPS) Code scheduling Where do we insert the “nops” ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Problem? This really slows us down!
27
Data Hazard - Pipeline Stalls?
bubble I M R e g s u b $ 2 , 1 3 a n d 5 o r 6 4 w ( ) D stall
28
Data Hazard - Forwarding
Use temporary results, don’t wait for them to be written Register file forwarding to handle read/write to same register ALU forwarding Ok.. Then, do we have to do this forwarding? If you are asked to design CPU using only rising-edge of the clock, then? Let’s stick to this for our project If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? Our textbook follows this
29
Forwarding (simplified)
ID/EX EX/MEM MEM/WB Register File ALU Data Memory MUX
30
Forwarding (from EX/MEM)
ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX
31
Forwarding (from MEM/WB)
ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX
32
Forwarding (operand selection)
ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX Forwarding Unit
33
Forwarding (operand propagation)
ALU Data Memory Register File MUX ID/EX EX/MEM MEM/WB Forwarding Unit Rt Rs Rd EX/MEM Rd MEM/WB Rd
34
Forwarding P C I n s t r u c i o m e y R g M x l A L U E X W B D / a F
.
35
Can't always forward lw (load word) can still cause a hazard
An instruction tries to read a register following a load instruction that writes to the same register Thus, we need a hazard detection unit to “stall” the pipeline after the load instruction
36
Stalling We can stall the pipeline by keeping an instruction in the same stage ID ID IF IF
37
Hazard Detection Unit Stall by letting an instruction that won’t write anything go forward Stall the pipeline if both ID/EX is a load and (rt=IF/ID.rs or rt=IF/ID.rt)
38
Control Hazards - Branch
When we decide to branch, other instructions are in the pipeline! Assume: branch is not taken When this assumption failed, flush 3 instructions We are predicting “branch not taken” need to add hardware for flushing instructions if we are wrong
39
Alleviate Branch Hazards
Move branch compare to ID stage of the pipeline Add adder to calculate branch target in ID stage Add IF.flush signal that zeros the instruction (or squash) in IF/ID pipeline register Reduce penalty to 1 cycle Actual condition is generated here Taken target address is known here IF ID MEM WB EX beq $1,$2,L1 IF ID MEM WB EX Bubblee add $1,$2,$3 … IF ID MEM WB EX L1: sub $1,$2, $3
40
Flushing Instructions
P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2
41
Flushing Instructions (cycle N)
beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) and $12, $2, $5 beq $1, $3, L2 I F . F l u s h H a z a r d d e t e c t i o n u n i t M I D / E X u x W B E X / M E M M C o n t r o l u M W B x M E M / W B I F / I D E X M W B 4 S h i f t l e f t 2 M u = x R e g i s t e r s P C I n s t r u c t i o n D a t a A L U m e m o r y m e m o r y M u M x u x S i g n e x t e n d M u x F o r w a r d i n g u n i t
42
Flushing Instructions (cycle N)
beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2 and $12, $2, $5 beq $1, $3, L2 L2
43
Flushing Instructions (cycle N+1)
beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2 nop beq $1, $3, L2 lw $4, 40($7)
44
Improving Performance
Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Add a “branch delay slot” The next instruction after a branch is always executed Rely on compiler to “fill” the slot with something useful Superscalar Start more than one instruction in the same cycle Most all processors are now pipelined and Superscalar
45
Dynamic Scheduling The hardware performs the “scheduling”
Hardware tries to find instructions to execute Out of order (OOO) execution is possible Speculative execution and dynamic branch prediction All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction issue PowerPC and Pentium: branch history table Compiler technology is important This class has given you the background you need to learn more
46
Exceptions & Interrupts
CPU has to prepare for all possible situations it could face “Unexpected” events require change in flow of control Exceptions arise within the CPU Undefined opcode Arithmetic overflow in MIPS Some other architectures (such as x86 and ARM) do not generate exception on arithmetic overflow. Instead, set bits of the flag register inside CPU Interrupts are from external I/O devices Keyboard, Mouse, Network card etc Many architectures and authors do not distinguish between interrupts and exceptions Often use the term “interrupt” to refer to both types of events
47
Pipelined Performance Example
Ideally CPI = 1 But, need to handle stalling (cause by loads and branches) SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type Suppose 40% of loads are used by next instruction 25% of branches are mispredicted What is the average CPI?
48
Pipelined Performance Example
SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type If there is no stall in the pipelined MIPS, how would you calculate CPI? Average CPI = (0.25) (1 CPI) + (0.10) (1 CPI) + (0.11) (1 CPI) + (0.02) (1 CPI) + (0.52) (1 CPI) = 1 Suppose 40% of loads are used by next instruction 25% of branches are mispredicted All jumps flush next instruction What is the average CPI? Load/Branch CPI = 1 when no stalling, 2 when stalling. Thus CPIlw = 1 (0.6) + 2 (0.4) = 1.4 CPIbeq = 1 (0.75) + 2 (0.25) = 1.25 CPIjump = 2 (1) = 2 Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15
49
Pipelined Performance
Critical path of the pipelined MIPS processor: Tc = max { tpcq + tmem + tsetup , // IF stage 2(tRFread + tmux + teq + tAND + tmux + tsetup ) , // ID stage tpcq + tmux + tmux + tALU + tsetup , // EX stage tpcq + tmemwrite + tsetup , // MEM stage 2(tpcq + tmux + tRFwrite) // WB stage } Where does this “2” come from? If you are asked to design CPU using only rising-edge of the clock, then? Let’s stick to this for our project If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? Our textbook follows this
50
Pipelined Performance Example
Element Parameter Delay (ps) Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup Equality comparator teq 40 AND gate tAND 15 Memory write Tmemwrite 220 Register file write tRFwrite 100 ps Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup ) = 2[ ] ps = 550 ps
51
Pipelined Performance Example
For a program with 100 billion instructions executing on a pipelined MIPS processor, CPI = 1.15 Tc = 550 ps Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = (100 × 109)(1.15)(550× s) = 63 seconds Processor Execution Time (seconds) Speedup (single-cycle is baseline) Single-cycle 95 1 Multicycle 133 0.71 Pipelined 63 1.51
52
Backup Slides
53
Exception Handling in MIPS and Handler Actions
Exception handling in MIPS Hardware (CPU) CPU saves PC of offending (or interrupted) instruction to the “Exception Program Counter (EPC)” register CPU saves indication of the problem to the “Cause” register Jump to handler at 0x Exception Handler in Software Read cause, and transfer to relevant handler If restartable, Take corrective action Use EPC to return to program Otherwise Terminate program Report error using EPC, cause, …
54
Exceptions in a Pipeline
Another form of control hazard Consider overflow on add in EX stage add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware
55
Pipeline with Exceptions
56
Exception Example Exception on add in Handler
40 sub $11, $2, $4 44 and $12, $2, $5 48 or $13, $2, $6 4C add $1, $2, $1 50 slt $15, $6, $7 54 lw $16, 50($7) … Handler sw $25, 1000($0) sw $26, 1004($0) …
57
Exception Example
58
Exception Example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.