Download presentation
Presentation is loading. Please wait.
1
Lecture 5. MIPS Processor Design
COSE222, COMP212 Computer Architecture Lecture 5. MIPS Processor Design Pipelined MIPS #2 Prof. Taeweon Suh Computer Science & Engineering Korea University
2
Pipelined Datapath
3
Pipelining Example add $14, $5, $6 lw $13, 24($1) add $12, $3, $4
s t r u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)
4
lw: Instruction Fetch (IF)
m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z lw $s0, 8($t1)
5
lw: Instruction Decode (ID)
m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z lw $s0, 8($t1)
6
lw: Execution (EX) Execution lw $s0, 8($t1) I n s t r u c i o m e y A
d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z lw $s0, 8($t1)
7
lw: Memory (MEM) Memory lw $s0, 8($t1) I n s t r u c i o m e y A d 4 3
2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z lw $s0, 8($t1)
8
lw: Writeback (WB) Writeback lw $s0, 8($t1) I n s t r u c i o m e y A
d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z lw $s0, 8($t1)
9
Corrected Datapath (for lw)
I n s t r u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z lw $s0, 8($t1)
10
sw: Memory (MEM) Memory sw $1, 4($2) I n s t r u c i o m e y A d 4 3 2
l S h f F / D E X M W B x 1 P C a R g 6 L U Z sw $1, 4($2)
11
sw: Writeback (WB): do nothing
u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z sw $1, 4($2)
12
Pipeline Control Note that in this implementation, the branch is resolved in the MEM stage
13
Pipeline Control What needs to be controlled in each stage (IF, ID, EX, MEM, WB)? IF: Instruction fetch and PC increment ID: Instruction decode and operand fetch from register file and/or immediate EX: Execution stage RegDst ALUop[1:0] ALUSrc MA: Memory stage Branch MemRead MemWrite WB: Writeback MemtoReg RegWrite (note that this signal is in ID stage)
14
Pipeline Control Extend pipeline registers to include control information created in ID stage Pass control signals along just like the data
15
Datapath with Control
16
Datapath with Control IF: lw $10, 9($1) P C I n s t r u c i o m e y A
[ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f E X F / IF: lw $10, 9($1)
17
Datapath with Control IF: sub $11, $2, $3 ID: lw $10, 9($1) “lw” 11
m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E IF: sub $11, $2, $3 ID: lw $10, 9($1) 11 010 0001 “lw”
18
Datapath with Control ID: sub $11, $2, $3 EX: lw $10, 9($1)
m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 11 010 00 ID: sub $11, $2, $3 EX: lw $10, 9($1) IF: and $12, $4, $5 10 000 1100 “sub”
19
Datapath with Control EX: sub $11, $2, $3 MEM: lw $10, 9($1)
y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 EX: sub $11, $2, $3 MEM: lw $10, 9($1) ID: and $12, $4, $5 1100 IF: or $13, $6, $7 11 “and”
20
Datapath with Control MEM: sub $11, .. WB: lw $10, 9($1)
y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 MEM: sub $11, .. WB: lw $10, 9($1) EX: and $12, $4, $5 1100 ID: or $13, $6, $7 “or” IF: add $14, $8, $9
21
Datapath with Control WB: sub $11, .. MEM: and $12… EX: or $13, $6, $7
y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 WB: sub $11, .. MEM: and $12… 1100 EX: or $13, $6, $7 “add” ID: add $14, $8, $9 IF: xxxx
22
Datapath with Control WB: and $12… MEM: or $13, .. EX: add $14, $8, $9
s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X 10 000 WB: and $12… MEM: or $13, .. EX: add $14, $8, $9 IF: xxxx ID: xxxx
23
Datapath with Control MEM: add $14, .. EX: xxxx IF: xxxx ID: xxxx
s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X MEM: add $14, .. 10 EX: xxxx IF: xxxx ID: xxxx WB: or $13…
24
Datapath with Control WB: add $14.. MEM: xxxx EX: xxxx IF: xxxx
s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X WB: add $14.. MEM: xxxx EX: xxxx IF: xxxx ID: xxxx
25
Dependencies Dependencies incur data and control hazards
26
Data Hazard - Software Solution
Compiler techniques Insert nop (0x0000_0000) between instructions Where do we insert nops in the following example? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) However, it really slows us down! Code scheduling reorganizes the code so that it relieves the dependencies between instructions
27
Data Hazard - Forwarding
Don’t wait for them to be written to the register file Use temporary results Ok.. Then, do we have to do this forwarding? If the write to the register file occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? Our textbook follows this If RF writes at the rising-edge of the clock, then? Let’s stick to this for our project
28
Forwarding ID EX MEM WB ID/EX EX/MEM MEM/WB Register File ALU Data
Memory MUX
29
Forwarding (from EX/MEM)
ID EX MEM WB ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX
30
Forwarding (from MEM/WB)
ID EX MEM WB ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX
31
Forwarding (operand selection)
ID EX MEM WB ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX Forwarding Unit
32
Forwarding (operand propagation)
ID EX MEM WB ALU Data Memory Register File MUX ID/EX EX/MEM MEM/WB Forwarding Unit Rt Rs Rd EX/MEM Rd MEM/WB Rd
33
Forwarding P C I n s t r u c i o m e y R g M x l A L U E X W B D / a F
.
34
Can't always forward lw (load word) can still cause a hazard
An instruction tries to read a register following a load instruction that writes to the same register Thus, we need a hazard detection unit to “stall” the pipeline after the load instruction
35
Stalling We can stall the pipeline by keeping an instruction in the same stage ID - IF -
36
Data Hazard - Load-Use Case
at cc3 at cc4 IF ID EX MEM WB or $8, $2, $6 and $4, $2, $5 lw $2, 20($1) lw $2, 20($1) nop
37
Hazard Detection Unit Stall the pipeline if both ID/EX is a load and (rt=IF/ID.rs or rt=IF/ID.rt) Stall by letting an instruction (that won’t write anything) go forward
38
Control Hazards - Branch
When the branch condition is resolved, other instructions are in the pipeline It works like “not taken” prediction If branch turns out to be taken, flush instructions Note that in this implementation, the branch is resolved in the MEM stage (Check out the slide #6)
39
Alleviate Branch Hazards
Reduce penalty to 1 cycle Move the branch compare to the ID stage of pipeline Add an adder to calculate the branch target in ID stage Add the IF.flush signal that zeros the instruction (or squash) in IF/ID pipeline register Branch is resolved here Taken target address is known here IF ID MEM WB EX beq $1,$2,L1 IF ID MEM WB EX Bubblee add $1,$2,$3 … IF ID MEM WB EX L1: sub $1,$2, $3
40
Flushing Instructions
P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2
41
Control Hazard Handling Logic
42
Flushing Instructions (cycle N)
beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) and $12, $2, $5 beq $1, $3, L2 I F . F l u s h H a z a r d d e t e c t i o n u n i t M I D / E X u x W B E X / M E M M C o n t r o l u M W B x M E M / W B I F / I D E X M W B 4 S h i f t l e f t 2 M u = x R e g i s t e r s P C I n s t r u c t i o n D a t a A L U m e m o r y m e m o r y M u M x u x S i g n e x t e n d M u x F o r w a r d i n g u n i t
43
Flushing Instructions (cycle N)
beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2 and $12, $2, $5 beq $1, $3, L2 L2
44
Flushing Instructions (cycle N+1)
beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2 nop beq $1, $3, L2 lw $4, 40($7)
45
Improving Performance
Try and avoid stalls using hardware/software techniques Software technique Reorder instructions Utilize the delay slot of branch Hardware technique Implement the delayed branch
46
Performance of Pipelined CPU
Assuming that the CPU executes 100 billion instructions to run your program, what is the execution time of the program on a pipelined MIPS? CPU Time = # insts X CPI X clock cycle time (T) = # insts X CPI / f
47
CPI Example Ideally CPI = 1. But, need to handle stallings (by loads and branches) SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type Suppose 40% of loads are used by next instruction 25% of branches are mispredicted What is the average CPI?
48
CPI Example SPECINT2000 benchmark:
25% loads 10% stores 11% branches 2% jumps 52% R-type If there is no stall in the pipelined MIPS, how would you calculate CPI? Average CPI = (0.25) (1 CPI) + (0.10) (1 CPI) + (0.11) (1 CPI) + (0.02) (1 CPI) + (0.52) (1 CPI) = 1 Suppose 40% of loads are used by next instruction 25% of branches are mispredicted All jumps flush next instruction What is the average CPI? Load/Branch CPI = 1 when no stalling, 2 when stalling. Thus CPIlw = 1 (0.6) + 2 (0.4) = 1.4 CPIbeq = 1 (0.75) + 2 (0.25) = 1.25 CPIjump = 2 (1) = 2 Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15
49
Critical Path Critical path of the pipelined MIPS: Tc = max {
tpcq + tmem + tsetup , // IF stage tpcq + tRFread + tmux + teq + tAND + tmux + tsetup, // ID stage tpcq + tmux + tmux + tALU + tsetup , // EX stage tpcq + tmemread + tsetup , // MEM stage tpcq + tmux + tRFsetup // WB stage } If the write to the register file occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? Our textbook follows this If RF writes at the rising-edge of the clock, then? Let’s stick to this for our project
50
Example Element Parameter Delay (ps) Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup Equality comparator teq 40 AND gate tAND 15 Memory write Tmemwrite 220 Tc = max { tpcq + tmem + tsetup , tpcq + tRFread + tmux + teq + tAND + tmux + tsetup, tpcq + tmux + tmux + tALU + tsetup , tpcq + tmemread + tsetup , tpcq + tmux + tRFsetup } IF: = 300 ps ID: = 305ps EX: = 300 ps MA: = 300 ps WB: = 75 ps fc = 1/Tc Tc = (tpcq + tRFread + tmux + teq + tAND + tmux + tsetup ) = 305 ps fc = 1/0.305ns = 3.279GHz
51
(single-cycle is baseline)
Example Assuming that the CPU executes 100 billion instructions to run your program, what is the execution time of the program on a pipelined MIPS (CPI = 1.15, Tc = 305 ps)? Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = (100 × 109)(1.15)(0.35× 10-9 s) = seconds Processor Execution Time (seconds) Speedup (single-cycle is baseline) Single-cycle 95 1 Multicycle 133 0.71 Pipelined 40.25 2.36
52
Exceptions & Interrupts
CPU has to prepare for all possible situations it could face “Unexpected” events require change in flow of control MIPS CPU jumps to 0x8000_0180 when an exception occurs Exceptions arise within the CPU Undefined opcode Divide-by-zero Arithmetic overflow in MIPS Some other architectures (such as x86 and ARM) do not generate exception on arithmetic overflow. Instead, set bits of the flag register inside CPU Interrupts are from external I/O devices Keyboard, Mouse, Network card etc Many architectures and authors do not distinguish between interrupts and exceptions Often use the term “interrupt” to refer to both types of events
53
Simplified Hardware System
Interrupt Controller Interrupt MIPS ALU EAX R15 …. R1 R0 Address Bus Data Bus 32-bit Timer UART GPIO 0x00000FFF 4KB SRAM 0x UART: Universal Asynchronous Receiver and Transmitter GPIO: General Purpose Input/Output
54
Backup Slides
55
Superscalar Superscalar Compiler technology is important
Start more than one instruction in the same cycle Hardware performs the dynamic scheduling of instructions Hardware tries to find instructions to execute Out of order (OOO) execution is possible Speculative execution Dynamic branch prediction with branch history table Many modern processors (including x86) are superscalars Core i7: 14-stage pipeline with 4 instruction issue DEC Alpha 21264: 9-stage pipeline with 6 instruction issue Compiler technology is important
56
Exception Handling in MIPS and Handler Actions
Exception handling in MIPS Hardware (CPU) CPU saves PC of offending (or interrupted) instruction to the “Exception Program Counter (EPC)” register CPU saves indication of the problem to the “Cause” register Jump to handler at 0x Exception Handler in Software Read cause, and transfer to relevant handler If restartable, Take corrective action Use EPC to return to program Otherwise Terminate program Report error using EPC, cause, …
57
Exceptions in a Pipeline
Another form of control hazard Consider overflow on add in EX stage add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware
58
Pipeline with Exceptions
59
Exception Example Exception on add in Handler
40 sub $11, $2, $4 44 and $12, $2, $5 48 or $13, $2, $6 4C add $1, $2, $1 50 slt $15, $6, $7 54 lw $16, 50($7) … Handler sw $25, 1000($0) sw $26, 1004($0) …
60
Exception Example
61
Exception Example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.