Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pipelining - 1.

Similar presentations


Presentation on theme: "Pipelining - 1."— Presentation transcript:

1 Pipelining - 1

2 Pipelining: Definition
An implementation technique instructions are overlapped in execution. A key technique for improving performance in many, if not all, contemporary processors. In multiple cycle implementation, steps of instructions are executed in different stages. functional units for different purposes In pipelined datapath Some units used more efficiently Some units must be replicated

3 Laundry Analogy Four different stages Identical time (30 min)

4 Pipelined Laundry

5 Multicycle Datapath

6 Pipelined Datapath MIPS instructions classically take five steps:
IF - Fetch instruction from memory ID - Read registers while decoding the instruction EX - Execute the operation or calculate an address MEM - Access an operand in memory WB - Write result into a register

7 Times of Instruction Steps
Instruction class Inst. Fetch Register Read ALU op Data access Register write Total time lw 200 ps 100 ps 800 ps sw 700 ps R-format 600 ps Branch 500 ps

8 Pipelined vs. Nonpipelined
Program execution order 200 400 600 800 1000 1200 1400 1600 1800 Time IF ID EX MEM WB lw $t1, 100($s1) lw $t2, 200($s1) lw $t2, 300($s1) lw $t1, 100($s1) lw $t2, 200($s1) lw $t2, 300($s1) Program execution order Time 200 400 600 800 1000 1200 1400 1600 1800 IF EX MEM WB ID EX MEM WB ID IF EX MEM WB ID IF

9 Aspects of Pipelining Latency is the same (usually a little worse)
Instruction throughput increases. What is the ideal speedup due to pipelining? In practice, it is hard to achieve the theoretical speedup because of 1) imperfectly balanced instruction steps and 2) hazards. Features of MIPS that make the pipelining easy: Instructions are of the same length. Few instruction formats Load/store architecture Operands are aligned in memory

10 Pipelining Problems Structural hazards: Control hazards: Data hazards:
Hardware cannot support the combination of instructions in the same clock cycle. one memory for instruction and data. Control hazards: branch instructions change the flow of instructions Data hazards: an instruction depends on the result of a previous instruction The result may not be available when needed by the current instruction

11 Pipeline Stalls on Control Hazards
Program execution order 200 400 600 800 1000 1200 1400 1600 1800 IF ID EX MEM WB add $t1, $s1, $s2 IF ID EX MEM WB beq $t2, $t3, 40 200 ps EX MEM WB ID IF lw $t4, 300($s1) 400 ps Assume that we can resolve branch in the second step Next instruction cannot start until the end of second cycle. The pipeline stops issuing new instruction, or it stalls.

12 Predict the Outcome Program execution order 2 4 6 8 10 12 14 16 18 IF ID EX MEM WB add $t1, $s1, $s2 IF ID EX MEM WB beq $t2, $t3, 40 2 ns EX MEM WB ID IF lw $t4, 300($s1) 2 ns One simple approach is to predict that branches will always fail. When otherwise occurs (branch is indeed taken), then started instruction is discarded (flushed out) and the instruction at the branch address will start executing.

13 Delayed Branches A solution actually used by the MIPS architecture.
Delayed branches always execute next sequential instruction MIPS software places an instruction immediately after the delayed branch instruction that is not affected by the branch Compilers fill about 50% of the branch delay slots with useful instructions Program execution order 2 4 6 8 10 12 14 16 18 IF ID EX MEM WB beq $t2, $t3, 40 WB EX MEM ID IF add $t1, $s1, $s2 2 ns EX MEM WB ID IF lw $t4, 300($s1)

14 Data Hazards An instruction depends on the results of a previous instruction still in the pipeline. Example: add $s0, $t0,$t1 sub $t2, $s0, $t3 This is also known as data dependency. Normally, these types of data hazards could severely stall the pipeline We have to add three bubbles in the pipeline.

15 Forwarding First solution: compiler optimizations rearrange the instruction sequence. Second solution: forwarding (or bypassing). ALU creates the result much sooner than the result is actually written back to a register. As soon as the ALU creates it, the result immediately can be forwarded to the next instruction. Direct use of ALU results.

16 Direct use of ALU result
Forwarding Direct use of ALU result

17 Forwarding: Examples Forwarding when an R-format instruction following a load instruction.

18 Reordering Code to Avoid Pipeline Stalls
Code that causes a pipeline stall due to data hazard. lw $t0, 0($t1) lw $t2, 4($t1) addi $s0, $t2, 0x5 addi $s1, $t0, 0xA lw $t0, 0($t1) lw $t2, 4($t1) addi $s1, $t0, 0xA addi $s0, $t2, 0x5 Reordered code. No data hazard.

19 Recall: Single Cycle Datapath

20 Instruction and Data Flow
Instruction and data move generally from left to right with two exceptions: Write-back stage, which places the result back into the register file in the middle of the data path => data hazard The selection of the next value of the PC, choosing between the incremented PC and the branch address from the MEM stage => control hazard

21 Design of Pipelined Datapath
Break the datapath into smaller segments These segments can be shared by instructions Use registers between two consecutive segments of the datapath to hold the intermediate results. Pipeline registers

22 Pipelined Datapath

23 Example: IF of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X
Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory rt M U X 32 16 Sign Ext

24 Example: ID of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X
Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext rt rt rt

25 Example: EX of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X
Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext rt rt rt

26 Example: MEM of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X
Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext rt rt rt

27 Example: WB of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X
Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext rt rt rt

28 Example 2: sw, EX stage PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M
Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext

29 Example 2: MEM of sw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U
Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext

30 Oops! Nothing happening
Example 2: WB of sw M U X Oops! Nothing happening IF/ID ID/EX EX/MEM MEM/WB Add 4 Add Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext

31 Summary So Far Not all instructions require the complete datapath.
No information transfer from one pipeline stage to the next is possible except through pipeline registers Each component can be used only in a single pipeline stage Everything that happened in the previous stage will be overwritten.

32 Pipelined Control PC is written on each clock cycle (no write signal)
No write signals for pipeline registers. IF stage: no control signal since instruction is read and PC is updated each cycle ID stage: No control signals EX stage: RegDst, ALUOp, ALUSrc MEM stage: branch, MemRead, MemWrite WB stage: MemtoReg, RegWrite

33 Pipelined Datapath with Control Signals
M U X PCSrc ID/EX EX/MEM MEM/WB IF/ID Add Add Shift Left 2 4 RegWrite branch ALU MemtoReg MemWrite ALUSrc PC Registers zero M U X Instruction Memory M U X Data Memory 16 Sign Ext 32 ALU cntrl [15 0] 6 [20 16] rt M U X [15 11] rd MemRead ALUOp RegDst

34 Control Signals for Instructions
Nine control signals that start in the EX stage Main control unit generates the control signals during the ID stage. Instruction EX Stage MEM Stage WB Stage RegDest ALU Op1 ALU Op0 ALUSrc Branch Mem Read Write RegWrite Memto-Reg R-format 1 lw sw x beq

35 Control Lines for the Last Three Stages
instruction WB WB MEM MEM WB EX IF/ID ID/EX EX/MEM MEM/WB

36 Pipelined Datapath with Control Signals
PCSrc M U X ID/EX EX/MEM Control WB M WB MEM/WB IF/ID EX M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg ALU ALUSrc MemWrite PC zero Registers Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X MemRead [15 11] ALUOp RegDst

37 Example Show how these five instructions will go through the pipeline: lw $10, 20($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9

38 Clock 1 PCSrc PC IF: lw $10, 20($1) ID: before<1>
EX: before<2> MEM: before<3> WB: … PCSrc M U X ID/EX Control 00 00 EX/MEM WB 000 000 00 M WB MEM/WB 0000 EX 00 IF/ID M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg ALU ALUSrc MemWrite PC Registers zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

39 Clock 2 PC IF: sub $11, $2, $3 ID: lw $10, 20($1) EX: before<1>
MEM: before<2> WB: … PCSrc M U X ID/EX Control 11 00 EX/MEM WB 010 000 00 M WB MEM/WB IF/ID 0001 EX 00 M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg ALU ALUSrc MemWrite PC Registers zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

40 Clock 3 zero PC IF:and $12, $4, $5 ID: sub $11, $2, $3
EX: lw $10, 20($1) MEM: before<1> WB: … PCSrc M U X ID/EX Control 10 11 EX/MEM WB 000 010 00 M WB MEM/WB IF/ID 1100 EX 00 M WB Add 1 Add Shift Left 2 4 RegWrite branch MemtoReg ALUSrc MemWrite PC Registers ALU zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

41 Clock 4 1 PC IF:or $13, $6,$7 ID:and $12, $4, $5 EX: sub $11, $2, $3
MEM: lw $10, 20($1) WB: … PCSrc M U X ID/EX Control 10 10 EX/MEM WB 000 000 11 M WB 1 MEM/WB 1 1100 EX 10 IF/ID M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg ALUSrc MemWrite PC Registers ALU zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

42 Clock 5 PC IF:add $14, $8, $9 ID:or $13, $6,$7 EX:and $12, $4, $5
MEM:sub … WB: lw PCSrc M U X ID/EX Control 10 10 EX/MEM WB 000 000 10 M WB 1 MEM/WB 1100 EX 1 IF/ID 10 M WB 1 Add Add Shift Left 2 4 RegWrite branch MemtoReg ALUSrc MemWrite PC Registers ALU zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

43 Data Hazards & Forwarding
Assumption: a register can be read and written in the same clock cycle.

44 Data Hazards & Forwarding
Software solution for data hazards: sub $2, $1, $3 nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) But, these dependencies happen too often to rely on compilers

45 Pipelined Datapath with Control Signals
M U X PCSrc ID/EX EX/MEM MEM/WB IF/ID Add Add Shift Left 2 4 RegWrite branch ALU MemtoReg MemWrite ALUSrc PC Registers zero M U X Instruction Memory M U X Data Memory 16 Sign Ext [15 0] 32 ALU cntrl 6 [20 16] M U X [15 11] MemRead ALUOp RegDst

46 A Notation for Detecting Data Hazards
Two types of data hazards. EX/MEM.rd = ID/EX.rs EX/MEM.rd = ID/EX.rt MEM/WB.rd = ID/EX.rs MEM/WB.rd = ID/EX.rt Easy to detect data hazards using this notation sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 sub-and: EX/MEM.rd = ID/EX.rs = $2 sub-or: MEM/WB.rd = ID/EX.rt = $2

47 Forwarding

48 Pipeline Without Forwarding
Registers ALU M U X ID/EX EX/MEM Data Memory EX/MEM.rd rd rt

49 Pipeline with Forwarding Unit
Registers ALU M U X ID/EX EX/MEM Data Memory EX/MEM.rd rd rt EX/MEM.RegWrite MEM/WB.RegWrite ForwardA ForwardB rs MEM/WB.rd Forwarding Unit

50 Control Signals to Resolve Data Dependencies
EX Hazard: if (EX/MEM.RegWrite and (EX/MEM.rd  0) and (EX/MEM.rd = ID/EX.rs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.rd  0) and (EX/MEM.rd = ID/EX.rt)) ForwardB = 10 MEM Hazard: if (MEM/WB.RegWrite and (MEM/WB.rd  0) and (MEM/WB.rd = ID/EX.rs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.rd  0) and (MEM/WB.rd = ID/EX.rt)) ForwardB = 01

51 Control Values for Forwarding MUXes
MUX control Source Explanation ForwardA=00 ID/EX The first ALU operand comes from the register file ForwardA=10 EX/MEM The first ALU operand is forwarded from the prior ALU result ForwardA=01 MEM/WB The first ALU operand is forwarded from the data memory or an earlier ALU result ForwardB=00 The second ALU operand comes from the register file ForwardB=10 The second ALU operand is forwarded from the prior ALU result ForwardB=01 The second ALU operand is forwarded from the data memory or an earlier ALU result

52 Can’t Always Forward - Stalls
Load word can still cause a hazard:

53 Pipelined Datapath with Control
PCSrc M U X ID/EX EX/MEM Control WB M WB MEM/WB IF/ID EX M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg rs ALU rt ALUSrc MemWrite PC zero Registers Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] rt M U X MemRead [15 11] ALUOp RegDst

54 Hazard Detection Unit If (ID/EX.MemRead and ((ID/EX.rt = IF/ID.rs) or (ID/EX.rt = IF/ID.rt))) stall the pipeline

55 Stalling the Pipeline

56 How to Stall the Pipeline?
The instructions in the IF and ID stages must be stalled; (i.e. not allowed to make progress) IF Stage PC and IF/ID registers have write enable inputs Prevent the PC register and the IF/ID pipeline register from changing (i.e. set their write enable inputs to 0). The instruction in the IF stage continue to read instructions using the same PC.

57 How to Stall the Pipeline
ID Stage Since the data in the IF/ID register doesn’t change, the same instruction is decoded The registers in the ID stage will continue to be read using the same instruction fields in the IF/ID pipeline register. Hazard is detected in the ID stage, thus the we insert a bubble in the EX stage. Inserting a bubble means deasserting all the control signals in the ID stage in order to prevent the bubble writing into anything. Recall that control signals are generated in this stage

58 Pipelined Control

59 Pipelined Datapath with Control
M U X PCSrc ID/EX EX/MEM MEM/WB IF/ID Add Add Shift Left 2 4 RegWrite branch ALU MemtoReg MemWrite ALUSrc PC Registers zero M U X Instruction Memory M U X Data Memory [15 0] 16 Sign Ext 32 ALU cntrl 6 [20 16] M U X [15 11] MemRead ALUOp

60 Branch (or Control) Hazards
When the branch is decided to be taken, successive instructions following branch are already in the pipeline.

61 Branch (or Control) Hazards
Note that the branch instruction decides whether to branch in the MEM stage different than multicycle desing Control hazards are relatively simpler to understand than data hazards They occur less frequently. There is nothing as effective against control hazards as forwarding is for data hazards. Simpler schemes to overcome misprediction or ease of penalties.

62 Assume Branch Not Taken
“Stall until the branch is complete” is too slow. Alternative is to assume that the branch will not be taken If the branch is taken, the instructions that are being fetched and decoded must be discarded. If half the time the branches are taken, this optimization halves the cost of control hazards. To discard instructions, we merely change the original control values to 0s. Discarding instructions means we must be able to flush instructions in the IF, ID, and EX stages.

63 Reducing Branch Penalty
Moving branch execution earlier in the pipeline If the next PC for a branch is selected at the end of EX stage, then the branch penalty becomes only two clock cycles. Many MIPS implementation move the branch execution to the ID stage. Address calculation can easily be moved to ID stage since the PC and the immediate field are in the IF/ID register. Comparing can be done faster using XOR and OR gates than using subtraction

64 Flushing @ Branch Misprediction
Shift Left 2 + IF.FLUSH Hazard Detection Unit M U X ID/EX EX/MEM Control WB M U X M WB MEM/WB IF/ID EX M WB + RegWrite M U X 4 ALU = PC Registers Data Memory M U X Instruction Memory M U X Sign Ext M U X Forwarding Unit

65 Dynamic Branch Prediction
Branch prediction buffer or branch history table contains a bit for each branch about its past behavior. The buffer is indexed by the lower portion of the address of the instruction. branch target buffer 1-bit prediction scheme has performance shortcomings. Consider a loop branch that branches nine times in a row, then it is not taken once. The accuracy of this prediction scheme is?

66 2-bit Prediction Scheme
In 2-bit prediction scheme, a prediction must be wrong twice before it is changed.

67 Delayed-Branch Scheduling
(a) From before (b) From target sub $t4, $t5, $t6 add $t1, $t2, $t3 beq $t1, $zero Delay slot (c) From fall through add $t1, $t2, $t3 beq $t1, $zero or $t7, $t8, $t9 sub $t4, $t5, $t6 Delay slot add $t1, $t2, $t3 beq $t2, $zero Delay slot beq $t2, $zero becomes add $t1, $t2, $t3 beq $t1, $zero becomes becomes add $t1, $t2, $t3 beq $t2, $zero sub $t4, $t5, $t6 or $t7, $t8, $t9 add $t1, $t2, $t3 sub $t4, $t5, $t6

68 Visiting the Old Example 1/2
Delays of functional units Memory access: 200 ps, ALU op: 100 ps, Register access: 50 ps. Single cycle datapath = 600 ps SPECInt2000 Loads: 25%; Stores: 10%; Branch: 11%; Jumps: 2%; ALU Operations: 52% Multi cycle data path CPI = 525% + 410% + 311% + 32% + 452% = execution time per instruction: ?  ? = ? ps

69 Visiting the Old Example 2/2
Assumptions: 50% of loads are immediately followed by an instruction that uses the result of load Branch penalty on misprediction: one clock cycle. 25% of branches are mispredicted. Jumps: 2 clock cycles Pipelined datapath CPI = 1.525% + 110% 11% + 22% + 152% = Average time per instruction:  200 ps = ps

70 Exception in the Pipelined Datapath
0x40 sub $11, $2, $4 0x44 and $12, $2, $5 0x48 or $13, $2, $6 0x4C add $1, $2, $1 0x50 slt $15, $6, $7 0x54 lw $16, 50($7) Suppose an overflow exception occurs in the add instruction. Exception handling routine 0x sw $25, 1000($0) 0x sw $26, 1004($0) ...

71 Example: clock 6 X = causing an exception lw $16, 50($7)
slt $15, $6, $7 add $1, $2, $1 or $13, … and $12 … EX.Flush Hazard Detection Unit X M U X ID/EX M U X Control OR WB EX/MEM ID.Flush M M U X WB MEM/WB IF/ID EX M U X M WB Cause + + Except PC Shift Left 2 RegWrite M U X 4 ALU PC = Registers Data Memory M U X Instruction Memory M U X Sign Ext M U X Forwarding Unit IF.FLUSH

72 Example: clock 7 = + + sw $25, 1000($0) bubble bubble bubble or $13, …
EX.Flush Hazard Detection Unit M U X ID/EX M U X Control OR WB EX/MEM ID.Flush M M U X WB MEM/WB IF/ID EX M U X M WB Cause + + Except PC Shift Left 2 RegWrite M U X 4 ALU PC = Registers Data Memory M U X Instruction Memory M U X Sign Ext M U X Forwarding Unit IF.FLUSH


Download ppt "Pipelining - 1."

Similar presentations


Ads by Google