Lecture 9. MIPS Processor Design – Pipelined Processor Design #2 2010 R&E Computer System Education & Research Lecture 9. MIPS Processor Design – Pipelined Processor Design #2 Prof. Taeweon Suh Computer Science Education Korea University
Pipelined Datapath
Example for lw instruction: Instruction Fetch (IF) y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
Example for lw instruction: Instruction Decode (ID) y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
Example for lw instruction: Execution (EX) y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
Example for lw instruction: Memory (MEM) d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
Example for lw instruction: Writeback (WB) y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
Example for sw instruction: Memory (MEM) d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
Example for sw instruction: Writeback (WB): do nothing y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
Corrected Datapath (for lw) I n s t r u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z
Pipelining Example add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 s t r u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)
Pipeline Control Note that in this implementation, branch instruction decides whether to branch in the MEM stage
Pipeline Control We have 5 stages IF, ID, EX, MEM, WB What needs to be controlled in each stage? Instruction fetch and PC increment Instruction decode / operand fetch Execution stage RegDst ALUop[1:0] ALUSrc Memory stage Branch MemRead MemWrite Writeback MemtoReg RegWrite (note that this signal is in ID stage)
Pipeline Control Extend pipeline registers to include control information (created in ID) Pass control signals along just like the data
Datapath with Control
Datapath with Control IF: lw $10, 9($1) P C I n s t r u c i o m e y A [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f E X F / IF: lw $10, 9($1)
Datapath with Control IF: sub $11, $2, $3 ID: lw $10, 9($1) “lw” 11 m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E IF: sub $11, $2, $3 ID: lw $10, 9($1) 11 010 0001 “lw”
Datapath with Control ID: sub $11, $2, $3 EX: lw $10, 9($1) m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 11 010 00 ID: sub $11, $2, $3 EX: lw $10, 9($1) IF: and $12, $4, $5 10 000 1100 “sub”
Datapath with Control EX: sub $11, $2, $3 MEM: lw $10, 9($1) y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 EX: sub $11, $2, $3 MEM: lw $10, 9($1) ID: and $12, $4, $5 1100 IF: or $13, $6, $7 11 “and”
Datapath with Control MEM: sub $11, .. WB: lw $10, 9($1) y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 MEM: sub $11, .. WB: lw $10, 9($1) EX: and $12, $4, $5 1100 ID: or $13, $6, $7 “or” IF: add $14, $8, $9
Datapath with Control WB: sub $11, .. MEM: and $12… EX: or $13, $6, $7 y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 WB: sub $11, .. MEM: and $12… 1100 EX: or $13, $6, $7 “add” ID: add $14, $8, $9 IF: xxxx
Datapath with Control WB: and $12… MEM: or $13, .. EX: add $14, $8, $9 s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X 10 000 WB: and $12… MEM: or $13, .. EX: add $14, $8, $9 IF: xxxx ID: xxxx
Datapath with Control MEM: add $14, .. EX: xxxx IF: xxxx ID: xxxx s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X MEM: add $14, .. 10 EX: xxxx IF: xxxx ID: xxxx WB: or $13…
Datapath with Control WB: add $14.. MEM: xxxx EX: xxxx IF: xxxx s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X WB: add $14.. MEM: xxxx EX: xxxx IF: xxxx ID: xxxx
Dependencies Dependencies Problem with starting (or executing) next instruction before first is finished Dependencies incur data and control hazards
Data Hazard - Software Solution Dependencies that “go backward in time” Have compiler guarantee no hazards? Insert nop (no operation) instructions (“0x00000000” is nop in MIPS) Code scheduling Where do we insert the “nops” ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Problem? This really slows us down!
Data Hazard - Pipeline Stalls? bubble I M R e g s u b $ 2 , 1 3 a n d 5 o r 6 4 w ( ) D stall
Data Hazard - Forwarding Use temporary results, don’t wait for them to be written Register file forwarding to handle read/write to same register ALU forwarding Ok.. Then, do we have to do this forwarding? If you are asked to design CPU using only rising-edge of the clock, then? Let’s stick to this for our project If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? Our textbook follows this
Forwarding (simplified) ID/EX EX/MEM MEM/WB Register File ALU Data Memory MUX
Forwarding (from EX/MEM) ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX
Forwarding (from MEM/WB) ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX
Forwarding (operand selection) ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX Forwarding Unit
Forwarding (operand propagation) ALU Data Memory Register File MUX ID/EX EX/MEM MEM/WB Forwarding Unit Rt Rs Rd EX/MEM Rd MEM/WB Rd
Forwarding P C I n s t r u c i o m e y R g M x l A L U E X W B D / a F .
Can't always forward lw (load word) can still cause a hazard An instruction tries to read a register following a load instruction that writes to the same register Thus, we need a hazard detection unit to “stall” the pipeline after the load instruction
Stalling We can stall the pipeline by keeping an instruction in the same stage ID ID IF IF
Hazard Detection Unit Stall by letting an instruction that won’t write anything go forward Stall the pipeline if both ID/EX is a load and (rt=IF/ID.rs or rt=IF/ID.rt)
Control Hazards - Branch When we decide to branch, other instructions are in the pipeline! Assume: branch is not taken When this assumption failed, flush 3 instructions We are predicting “branch not taken” need to add hardware for flushing instructions if we are wrong
Alleviate Branch Hazards Move branch compare to ID stage of the pipeline Add adder to calculate branch target in ID stage Add IF.flush signal that zeros the instruction (or squash) in IF/ID pipeline register Reduce penalty to 1 cycle Actual condition is generated here Taken target address is known here IF ID MEM WB EX beq $1,$2,L1 IF ID MEM WB EX Bubblee add $1,$2,$3 … IF ID MEM WB EX L1: sub $1,$2, $3
Flushing Instructions P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2
Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) and $12, $2, $5 beq $1, $3, L2 I F . F l u s h H a z a r d d e t e c t i o n u n i t M I D / E X u x W B E X / M E M M C o n t r o l u M W B x M E M / W B I F / I D E X M W B 4 S h i f t l e f t 2 M u = x R e g i s t e r s P C I n s t r u c t i o n D a t a A L U m e m o r y m e m o r y M u M x u x S i g n e x t e n d M u x F o r w a r d i n g u n i t
Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2 and $12, $2, $5 beq $1, $3, L2 L2
Flushing Instructions (cycle N+1) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2 nop beq $1, $3, L2 lw $4, 40($7)
Improving Performance Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Add a “branch delay slot” The next instruction after a branch is always executed Rely on compiler to “fill” the slot with something useful Superscalar Start more than one instruction in the same cycle Most all processors are now pipelined and Superscalar
Dynamic Scheduling The hardware performs the “scheduling” Hardware tries to find instructions to execute Out of order (OOO) execution is possible Speculative execution and dynamic branch prediction All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction issue PowerPC and Pentium: branch history table Compiler technology is important This class has given you the background you need to learn more
Exceptions & Interrupts CPU has to prepare for all possible situations it could face “Unexpected” events require change in flow of control Exceptions arise within the CPU Undefined opcode Arithmetic overflow in MIPS Some other architectures (such as x86 and ARM) do not generate exception on arithmetic overflow. Instead, set bits of the flag register inside CPU Interrupts are from external I/O devices Keyboard, Mouse, Network card etc Many architectures and authors do not distinguish between interrupts and exceptions Often use the term “interrupt” to refer to both types of events
Pipelined Performance Example Ideally CPI = 1 But, need to handle stalling (cause by loads and branches) SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type Suppose 40% of loads are used by next instruction 25% of branches are mispredicted What is the average CPI?
Pipelined Performance Example SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type If there is no stall in the pipelined MIPS, how would you calculate CPI? Average CPI = (0.25) (1 CPI) + (0.10) (1 CPI) + (0.11) (1 CPI) + (0.02) (1 CPI) + (0.52) (1 CPI) = 1 Suppose 40% of loads are used by next instruction 25% of branches are mispredicted All jumps flush next instruction What is the average CPI? Load/Branch CPI = 1 when no stalling, 2 when stalling. Thus CPIlw = 1 (0.6) + 2 (0.4) = 1.4 CPIbeq = 1 (0.75) + 2 (0.25) = 1.25 CPIjump = 2 (1) = 2 Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15
Pipelined Performance Critical path of the pipelined MIPS processor: Tc = max { tpcq + tmem + tsetup , // IF stage 2(tRFread + tmux + teq + tAND + tmux + tsetup ) , // ID stage tpcq + tmux + tmux + tALU + tsetup , // EX stage tpcq + tmemwrite + tsetup , // MEM stage 2(tpcq + tmux + tRFwrite) // WB stage } Where does this “2” come from? If you are asked to design CPU using only rising-edge of the clock, then? Let’s stick to this for our project If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? Our textbook follows this
Pipelined Performance Example Element Parameter Delay (ps) Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup Equality comparator teq 40 AND gate tAND 15 Memory write Tmemwrite 220 Register file write tRFwrite 100 ps Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup ) = 2[150 + 25 + 40 + 15 + 25 + 20] ps = 550 ps
Pipelined Performance Example For a program with 100 billion instructions executing on a pipelined MIPS processor, CPI = 1.15 Tc = 550 ps Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = (100 × 109)(1.15)(550× 10-12 s) = 63 seconds Processor Execution Time (seconds) Speedup (single-cycle is baseline) Single-cycle 95 1 Multicycle 133 0.71 Pipelined 63 1.51
Backup Slides
Exception Handling in MIPS and Handler Actions Exception handling in MIPS Hardware (CPU) CPU saves PC of offending (or interrupted) instruction to the “Exception Program Counter (EPC)” register CPU saves indication of the problem to the “Cause” register Jump to handler at 0x8000 00180 Exception Handler in Software Read cause, and transfer to relevant handler If restartable, Take corrective action Use EPC to return to program Otherwise Terminate program Report error using EPC, cause, …
Exceptions in a Pipeline Another form of control hazard Consider overflow on add in EX stage add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware
Pipeline with Exceptions
Exception Example Exception on add in Handler 40 sub $11, $2, $4 44 and $12, $2, $5 48 or $13, $2, $6 4C add $1, $2, $1 50 slt $15, $6, $7 54 lw $16, 50($7) … Handler 80000180 sw $25, 1000($0) 80000184 sw $26, 1004($0) …
Exception Example
Exception Example