Download presentation
Presentation is loading. Please wait.
Published byChad Doyle Modified over 8 years ago
1
L17 – Pipeline Issues 1 Comp 411 – Fall 2009 11/23/09 CPU Pipelining Issues Read Chapter 4.7-4.8 This pipe stuff makes my head hurt! What have you been beating your head against?
2
L17 – Pipeline Issues 2 Comp 411 – Fall 2009 11/23/09 ALU AB ALUFN Data Memory RD WD R/W Adr Wr WDSEL 0 1 2 PC+4 Z VNC PC +4 Instruction Memory A D 00 BT PC :J :00 JT PCSEL 0123456 0x80000080 0x80000040 0x80000000 PC REG 00 IR REG WA Register File RA1RA2 RD1RD2 J: Imm: + x4 BT JT Rt: Rs: ASEL 20 BSEL 01 SEXT shamt: “16” 1 = BZ 5-Stage miniMIPS PC ALU 00 IR ALU A B WD ALU PC MEM 00 IR MEM Y MEM WD MEM WA Register File WA WD WE WERF WASEL Rd: Rt: “31” “27” 0 1 2 3 Instruction Fetch Register File ALU Write Back PC WB 00 IR WB Y WB Memory Address is available right after instruction enters Memory stage Data is needed just before rising clock edge at end of Write Back stage almost 2 clock cycles Omits some details NO bypass or interlock logic
3
L17 – Pipeline Issues 3 Comp 411 – Fall 2009 11/23/09 Pipelining Improve performance by increasing instruction throughput Ideal speedup is number of stages in the pipeline. Do we achieve this?
4
L17 – Pipeline Issues 4 Comp 411 – Fall 2009 11/23/09 Pipelining What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction Individual Instructions still take the same number of cycles But we’ve improved the through-put by increasing the number of simultaneously executing instructions
5
L17 – Pipeline Issues 5 Comp 411 – Fall 2009 11/23/09 Structural Hazards Inst Fetch Reg Read ALUData Access Reg Write Inst Fetch Reg Read ALUData Access Reg Write Inst Fetch Reg Read ALUData Access Reg Write Inst Fetch Reg Read ALUData Access Reg Write
6
L17 – Pipeline Issues 6 Comp 411 – Fall 2009 11/23/09 Problem with starting next instruction before first is finished dependencies that “go backward in time” are data hazards Data Hazards
7
L17 – Pipeline Issues 7 Comp 411 – Fall 2009 11/23/09 Have compiler guarantee no hazards Where do we insert the “nops” ? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software Solution
8
L17 – Pipeline Issues 8 Comp 411 – Fall 2009 11/23/09 Use temporary results, don’t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding Forwarding
9
L17 – Pipeline Issues 9 Comp 411 – Fall 2009 11/23/09 Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to “stall” the instruction Can't always forward
10
L17 – Pipeline Issues 10 Comp 411 – Fall 2009 11/23/09 Stalling We can stall the pipeline by keeping an instruction in the same stage
11
L17 – Pipeline Issues 11 Comp 411 – Fall 2009 11/23/09 Types of Data Hazards 1. RAW hazards corresponding to value dependences are most difficult to deal with, since they can never be eliminated The second instruction is waiting for information produced by the first instruction 2. WAR and WAW hazards are name dependences Two instructions happen to use the same register (name), although they don’t have to Can often be eliminated by renaming, either in software or hardware Implies the use of additional resources, hence additional cost Renaming is not always possible: implicit operands such as accumulator, PC, or condition codes cannot be renamed These hazards don’t cause problems for MIPS pipeline Relative timing does not change even with pipelined execution, because reads occur early and writes occur late in pipeline
12
L17 – Pipeline Issues 12 Comp 411 – Fall 2009 11/23/09 When we decide to branch, other instructions are in the pipeline! We are predicting “branch not taken” need to add hardware for flushing instructions if we are wrong Branch Hazards
13
L17 – Pipeline Issues 13 Comp 411 – Fall 2009 11/23/09 Improving Performance Try to avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Add a “branch delay slot” the next instruction after a branch is always executed rely on compiler to “fill” the slot with something useful Superscalar: start more than one instruction in the same cycle
14
L17 – Pipeline Issues 14 Comp 411 – Fall 2009 11/23/09 Dynamic Scheduling The hardware performs the “scheduling” hardware tries to find instructions to execute out of order execution is possible speculative execution and dynamic branch prediction All modern processors are very complicated Pentium 4: 20 stage pipeline, 6 simultaneous instructions PowerPC and Pentium: branch history table Compiler technology important
15
L17 – Pipeline Issues 15 Comp 411 – Fall 2009 11/23/09 ALU AB ALUFN RD WD R/W Adr Wr WDSEL 0 1 2 PC+4 Z VNC PC +4 Instruction Memory A D 00 BT PC :J :00 JT PCSEL 0123456 0x80000080 0x80000040 0x80000000 PC REG 00 IR REG WA Register File RA1RA2 RD1RD2 J: Imm: + x4 BT JT Rt: Rs: ASEL 20 BSEL 01 SEXT shamt: “16” 1 = BZ 5-Stage miniMIPS PC ALU 00 IR ALU A B WD ALU PC MEM 00 IR MEM Y MEM WD MEM WA Register File WA WD WE WERF WASEL Rd: Rt: “31” “27” 0 1 2 3 Instruction Fetch Register File ALU Write Back PC WB 00 IR WB Y WB Memory We wanted a simple, clean pipeline but… added CLK EN to freeze IF/RF stages so we can wait for lw to reach WB stage NOP Data Memory broke the sequential semantics of ISA by adding a branch delay-slot and early branch resolution logic added A/B bypass muxes to get data before it’s written to regfile
16
L17 – Pipeline Issues 16 Comp 411 – Fall 2009 11/23/09 Bypass MUX Details RD1RD2 Register File A BypassB Bypass ASEL 02 A ALU BSEL 01 B ALU ’16’SEXT(imm) To ALU WD ALU To Mem JT from ALU/MEM/WB/PC 1 shamt = BZ The previous diagram was oversimplified. Really need for the bypass muxes to precede the A and B muxes to provide the correct values for the jump target (JT), write data, and early branch decision logic.
17
L17 – Pipeline Issues 17 Comp 411 – Fall 2009 11/23/09 ALU AB ALUFN RD WD R/W Adr Wr WDSEL 0 1 2 PC+4 Z VNC PC +4 Instruction Memory A D 00 BT PC :J :00 JT PCSEL 0123456 0x80000080 0x80000040 0x80000000 PC REG 00 IR REG WA Register File RA1RA2 RD1RD2 J: Imm: + x4 BT JT Rt: Rs: ASEL 20 BSEL 01 SEXT shamt: “16” 1 = BZ Final 5-Stage miniMIPS PC ALU 00 IR ALU A B WD ALU PC MEM 00 IR MEM Y MEM WD MEM WA Register File WA WD WE WERF WASEL Rd: Rt: “31” “27” 0 1 2 3 Instruction Fetch Register File ALU Write Back PC WB 00 IR WB Y WB Memory Data Memory NOP Added branch delay slot and early branch resolution logic to fix a CONTROL hazard Added lots of bypass paths and detection logic to fix various STRUCTURAL hazards Added pipeline interlocks to fix load delay STRUCTURAL hazard NOP
18
L17 – Pipeline Issues 18 Comp 411 – Fall 2009 11/23/09 Pipeline Summary (I) Started with unpipelined implementation – direct execute, 1 cycle/instruction – it had a long cycle time: mem + regs + alu + mem + wb We ended up with a 5-stage pipelined implementation – increase throughput (3x???) – delayed branch decision (1 cycle) Choose to execute instruction after branch – delayed register writeback (3 cycles) Add bypass paths (6 x 2 = 12) to forward correct value – memory data available only in WB stage Introduce NOPs at IR ALU, to stall IF and RF stages until LD result was ready
19
L17 – Pipeline Issues 19 Comp 411 – Fall 2009 11/23/09 Pipeline Summary (II) Fallacy #1: Pipelining is easy Smart people get it wrong all of the time! Fallacy #2: Pipelining is independent of ISA Many ISA decisions impact how easy/costly it is to implement pipelining (i.e. branch semantics, addressing modes). Fallacy #3: Increasing Pipeline stages improves performance Diminishing returns. Increasing complexity.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.