Pipelining in more detail

Pipelining in more detail
Controls Hazards

Single Cycle MIPS Implementation
All instructions take the same amount of time Signals propagate along longest path Lots of operations happening in parallel Increment PC ALU Branch target computation

Multicycle MIPS Implementation
Instructions take different amounts of cycles Cycles are identical in length Share resources across cycles ALU Cycles are consecutive R-type and memory-reference instructions do different things in their 4th cycles

Pipelined MIPS Implementation
Like single cycle All instructions take the same amount of cycles Lots of things going on in parallel But they belong to different instructions! Each pipeline stage has associated state Unlike multicycle Stages are fixed May do nothing in one stage MEM stage for Reg-Reg instructions

Pipelining review Time IM Reg DM IM Reg DM IM Reg DM IM Reg DM
Clock Cycle 1 Clock Cycle 2 Clock Cycle 3 Clock Cycle 4 Clock Cycle 5 Clock Cycle 6 Clock Cycle 7 Time instr1 IM Reg DM instr2 IM Reg DM IM Reg DM instr3 IM Reg DM instr4 Instructions

Pipeline Control – big picture
Control Unit ID/EX EX/Mem Mem/WB IF/ID IF ID EX Mem WB

Pipeline control Many control signals same as in multicycle datapath
No write signal for PC or pipeline registers These are written on every cycle Need to set control at each stage – extend pipeline registers to include control IF, ID: no controls to set (all instructions behave identically) EX: set ALU sources and op, select between rt/rd for result register Mem: memory read/write, PC select for branches WB: select result from memory or ALU, write enable control for reg file.

Pipeline in action IF ID EX MEM WB Ctrl Instruction memory Register
file clk ALU CLK SE PC Data Memory 0x1234 Let’s execute a program at address 0x1234: lw $1, 4($3) add $2, $3, $4 or $3, $1, $2 bne $1, $3, L Suppose: $1 = 1, $2 = 2, $3 = 1000, $4 = 6

Pipeline in action 1 IF ID EX MEM WB lw $1, 4($3) Instruction memory
Register file ALU CLK SE PC Data Memory 0x1234 Clock: 1 $1 = 1, $2 = 2, $3 = 1000, $4 = 4

Pipeline in action 2 IF ID EX MEM WB add $2, $3, $4 Instruction memory
Register file ALU CLK SE PC Data Memory 0x1238 lw $1, 4($3) Clock: 1 2 $1 = 1, $2 =2, $3 =1000, $4 = 6

Pipeline in action 3 IF ID EX MEM WB or $3, $1, $2 Instruction memory
Register file ALU CLK SE PC Data Memory 0x123C add $2, $3, $4 lw $1, 4($3) Clock: 1 2 3 $1 = 1, $2 =2, $3 =1000, $4 = 6

Pipeline in action 4 (anything wrong?)
IF ID EX MEM WB bne $1, $3, L Instruction memory Register file ALU CLK SE PC Data Memory 0x1240 or $3, $1, $2 add $2, $3, $4 lw $1, 4($3) Clock: Let’s assume memory returns 42 for lw. 1 2 3 4 $1 = 1, $2 =2, $3 =1000, $4 = 6

Data Hazards! IF ID EX MEM WB bne $1, $3, L Instruction memory
Register file ALU CLK SE PC Data Memory 0x1240 or $3, $1, $2 add $2, $3, $4 lw $1, 4($3) Clock: 1 2 3 4 $1 = 1, $2 = 2, $3 = 1000, $4 = 6

Pipeline in action 5 IF ID EX MEM WB sll $4, $3, 5 Instruction memory
Register file ALU CLK SE PC Data Memory 0x1244 bne $1, $3, L or $3, $1, $2 add $2, $3, $4 Clock: $1 and $2 both wrong for or because we got old values! lw $1, 4($3) 1 2 3 4 5 $1 = 42, $2 = 2, $3 = 1000, $4 = 6

Data Hazards Data dependence Data dependence (RAW) occurs when:
result of an operation needed before it is stored back in register file: lw $1, 4($3) add $2, $3, $4 or $3, $1, $2 nop The data hazard above is called read after write (RAW) Data dependence (RAW) occurs when: An instruction wants to read a register in stage 2, and One instruction in stage 3 or stage 4 is going to write that register Note: if the instruction writing the register is in stage 5, this is fine since we can write a register and read it in the same cycle Hazard detection unit compares register fields across stages.

Resolving data dependencies
Have the compiler generate no-ops Don’t have to deal with data hazards in hardware. Stall the pipeline, i.e., create bubbles the resulting delays are the same as for no-ops Send the result generated in stage 3 or stage 4 to the appropriate input of the ALU. This is called forwarding or bypassing. More performance at the cost of more hardware For one simple pipeline, cost is slightly more control and extra buses For several pipelines, say n, communication grows as O(n2)

Forwarding from WB to EX
IF ID EX MEM WB sll $4, $3, 5 Instruction memory Register file ALU CLK SE PC Data Memory 0x1244 bne $1, $3, L or $3, $1, $2 add $2, $3, $4 Clock: $1 fixed, $2 still wrong lw $1, 4($3) 1 2 3 4 5 $1 = 42, $2 = 2, $3 = 1000, $4 = 6

Forwarding from MEM to EX
IF ID EX MEM WB sll $4, $3, 5 Instruction memory Register file ALU CLK SE PC Data Memory 0x1244 bne $1, $3, L or $3, $1, $2 add $2, $3, $4 Clock: lw $1, 4($3) 1 2 3 4 5 $1 = 42, $2 = 2, $3 = 1000, $4 = 6

Generally, need to forward to both inputs!
IF ID EX MEM WB sll $4, $3, 5 Instruction memory Register file ALU CLK SE PC Data Memory 0x1244 or $3, $1, $2 beq $1, $3, L add $2, $3, $4 What if we had: or $3, $1, $1 Clock: lw $1, 4($3) Need to forward $1 to both ALU inputs 1 2 3 4 5 $1 = 42, $2 = 2, $3 = 1000, $4 = 6

Forwarding a register twice?
What if had: add $1, $2, $2 or $1, $3, $4 and $5, $1, $1 Where do you forward $1 from? Answer: from most recent $1 (i.e. or)

Another look at forwarding…
Clock Cycle 1 Clock Cycle 2 Clock Cycle 3 Clock Cycle 4 Clock Cycle 5 Clock Cycle 6 Clock Cycle 7 Time lw $1, 4($3) IM Reg DM Forward $1 add $2, $3, $4 IM Reg DM Forward $2 IM Reg DM or $3, $1, $2 fwd $3 IM Reg DM bne $1, $3, L Instructions

Data hazards and loads Time
Clock Cycle 1 Clock Cycle 2 Clock Cycle 3 Clock Cycle 4 Clock Cycle 5 Clock Cycle 6 Clock Cycle 7 Time Can’t do forwarding in this case: can’t go back in time. Have to insert a bubble between lw and or lw $1, 4($3) IM Reg DM ? or $3, $1, $2 IM Reg DM What if we switch these? IM Reg DM add $2, $3, $4 IM Reg DM bne $1, $3, L Instructions

Inserting a bubble Have to do it when you load into a register which is used in the next instruction Clock Cycle 1 Clock Cycle 2 Clock Cycle 3 Clock Cycle 4 Clock Cycle 5 Clock Cycle 6 Clock Cycle 7 Time lw $1, 4($3) IM Reg DM Now we can forward $1 or $3, $1, $2 IM Reg DM do nothing IM Reg DM add $2, $3, $4 IM Reg DM bne $1, $3, L Instructions …

Other hazards We’ve seen data hazards Other types of hazards:
when an instruction in the pipeline is dependent on another instruction still in the pipeline. Resolved by: Bypassing/forwarding Stalling Other types of hazards: Structural hazards where two instructions at different stages want to use the same resource. Solved by using more resources (e.g., instruction and data memory; several ALU’s). Won’t happen in our pipeline. Control hazards happens on a taken branch. Evaluation of branch condition and calculation of branch target is not completed before next instruction is fetched.

Branch path IF ID EX MEM WB instr at L Instruction memory Register
file ALU CLK SE PC sll $4, $3, 5 Data Memory L: This is wrong! (Should be instr at L) =? result of comparison or $3, $1, $2 Determine new PC add $2, $3, $4 Clock: bne $1, $3, L 1 2 3 4 5 6 $1 = 42, $2 = 1006, $3 = 1000, $4 = 6

Resolving control hazards
Stall until result of the condition & target are known too slow Reduce penalty by redesigning the pipeline: move branch calculation to ID stage move branch comparison to ID stage how to do quick compare? use both edges of the clock Delay slots specify in ISA that instruction after branch is always executed. Branch prediction in IF stage, guess branch outcome correct it if turns out to be wrong.

Pipelining in more detail

Similar presentations

Presentation on theme: "Pipelining in more detail"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pipelining in more detail

Similar presentations

Presentation on theme: "Pipelining in more detail"— Presentation transcript:

Similar presentations

About project

Feedback