Lecture 18: Pipelining Today’s topics:

Slides:

Advertisements

Similar presentations

1 Lecture: Pipelining Hazards Topics: Basic pipelining implementation, hazards, bypassing HW2 posted, due Wednesday.

Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)

1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.

1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.

1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Lecture 16: Basic Pipelining

1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Lecture: Out-of-order Processors

Instruction Level Parallelism

Lecture 16: Basic Pipelining

Lecture: Pipelining Basics

Lecture: Branch Prediction

Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards

Lecture: Pipelining Extensions

Lecture: Out-of-order Processors

Lecture: Pipelining Basics

Lecture 6: Advanced Pipelines

Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Pipelining review.

Lecture 16: Basic Pipelining

Lecture 10: Out-of-order Processors

Lecture 11: Out-of-order Processors

Lecture: Out-of-order Processors

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture: Pipelining Hazards

Lecture 18: Pipelining Today’s topics:

Lecture: Static ILP, Branch Prediction

Lecture: Pipelining Hazards

Lecture 5: Pipelining Basics

Pipelining in more detail

Lecture: Branch Prediction

Lecture: Out-of-order Processors

Lecture 8: Dynamic ILP Topics: out-of-order processors

Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Lecture: Pipelining Extensions

Lecture 20: OOO, Memory Hierarchy

Lecture: Pipelining Extensions

Lecture 20: OOO, Memory Hierarchy

Lecture 18: Pipelining Today’s topics:

Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.

Lecture 4: Advanced Pipelines

Lecture: Pipelining Hazards

Lecture 10: ILP Innovations

Lecture 9: ILP Innovations

Lecture 9: Dynamic ILP Topics: out-of-order processors

Lecture: Pipelining Basics

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

Lecture 18: Pipelining Today’s topics: Hazards and instruction scheduling Branch prediction Out-of-order execution

Example 5 IF IF IF IF IF IF IF IF D/R D/R D/R D/R D/R D/R D/R D/R ALU Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R3+R8R9. Identify the input latch for each input operand. CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8 IF IF IF IF IF IF IF IF D/R D/R D/R D/R D/R D/R D/R D/R ALU ALU ALU ALU ALU ALU ALU ALU DM DM DM DM DM DM DM DM RW RW RW RW RW RW RW RW

Example 5 IF I1 IF I2 IF I3 IF I4 IF I5 IF IF IF D/R D/R I1 D/R I2 D/R Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R3+R8R9. Identify the input latch for each input operand. CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8 IF I1 IF I2 IF I3 IF I4 IF I5 IF IF IF D/R D/R I1 D/R I2 D/R I3 D/R I4 D/R D/R D/R L3 L3 L4 L3 L5 L3 ALU ALU ALU I1 ALU I2 ALU I3 ALU ALU ALU DM DM DM DM I1 DM I2 DM I3 DM DM RW RW RW RW RW I1 RW I2 RW I3 RW

Example 1 IF D/R ALU DM RW IF D/R ALU DM RW IF D/R ALU DM RW add $1, $2, $3 lw $4, 8($1) IF D/R ALU DM RW

Example 2 IF D/R ALU DM RW IF D/R ALU DM RW IF D/R lw $1, 8($2)

Example 3 IF D/R ALU DM RW IF D/R ALU DM RW IF D/R ALU DM RW lw $1, 8($2) sw $1, 8($3) IF D/R ALU DM RW

Control Hazards Simple techniques to handle control hazard stalls: for every branch, introduce a stall cycle (note: every 6th instruction is a branch!) assume the branch is not taken and start fetching the next instruction – if the branch is taken, need hardware to cancel the effect of the wrong-path instruction fetch the next instruction (branch delay slot) and execute it anyway – if the instruction turns out to be on the correct path, useful work was done – if the instruction turns out to be on the wrong path, hopefully program state is not lost make a smarter guess and fetch instructions from the expected target

Branch Delay Slots Source: H&P textbook

Pipeline without Branch Predictor PC IF (br) Reg Read Compare Br-target PC + 4

Pipeline with Branch Predictor PC IF (br) Reg Read Compare Br-target Branch Predictor

Bimodal Predictor Branch PC Table of 16K entries 14 bits of 2-bit saturating counters 14 bits Branch PC

2-Bit Prediction For each branch, maintain a 2-bit saturating counter: if the branch is taken: counter = min(3,counter+1) if the branch is not taken: counter = max(0,counter-1) … sound familiar? If (counter >= 2), predict taken, else predict not taken The counter attempts to capture the common case for each branch

Slowdowns from Stalls Perfect pipelining with no hazards  an instruction completes every cycle (total cycles ~ num instructions)  speedup = increase in clock speed = num pipeline stages With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes Total cycles = number of instructions + stall cycles

Multicycle Instructions Multiple parallel pipelines – each pipeline can have a different number of stages Instructions can now complete out of order – must make sure that writes to a register happen in the correct order

An Out-of-Order Processor Implementation Reorder Buffer (ROB) Branch prediction and instr fetch Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 T1 T2 T3 T4 T5 T6 Register File R1-R32 R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 Decode & Rename T1  R1+R2 T2  T1+R3 BEQZ T2 T4  T1+T2 T5  T4+T2 ALU ALU ALU Instr Fetch Queue Results written to ROB and tags broadcast to IQ Issue Queue (IQ)

Example Code Completion times with in-order with ooo ADD R1, R2, R3 5 5 ADD R4, R1, R2 6 6 LW R5, 8(R4) 7 7 ADD R7, R6, R5 9 9 ADD R8, R7, R5 10 10 LW R9, 16(R4) 11 7 ADD R10, R6, R9 13 9 ADD R11, R10, R9 14 10

Title Bullet