1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

Slides:

Advertisements

Similar presentations

1 Lecture: Pipelining Hazards Topics: Basic pipelining implementation, hazards, bypassing HW2 posted, due Wednesday.

Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)

1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.

1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.

1 Lecture 3: Pipelining Basics Biggest contributors to performance: clock speed, parallelism Today: basic pipelining implementation (Sections A.1-A.3)

1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )

1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Computer Architecture Pipeline. Motivation  Non-pipelined design  Single-cycle implementation  The cycle time depends on the slowest instruction 

Lecture 16: Basic Pipelining

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Lecture: Out-of-order Processors

Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards

Lecture: Pipelining Extensions

Lecture: Out-of-order Processors

Lecture 6: Advanced Pipelines

Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 10: Out-of-order Processors

Lecture 11: Out-of-order Processors

Lecture: Out-of-order Processors

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture: Pipelining Hazards

Lecture 6: Static ILP, Branch prediction

Lecture 18: Pipelining Today’s topics:

Lecture: Static ILP, Branch Prediction

Lecture: Pipelining Hazards

Lecture 5: Pipelining Basics

Lecture 18: Pipelining Today’s topics:

Lecture: Branch Prediction

Lecture: Pipelining Hazards

Lecture: Out-of-order Processors

Lecture 8: Dynamic ILP Topics: out-of-order processors

Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Lecture: Pipelining Extensions

Lecture 20: OOO, Memory Hierarchy

Lecture: Pipelining Extensions

Lecture 20: OOO, Memory Hierarchy

Lecture 18: Pipelining Today’s topics:

Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.

Lecture 4: Advanced Pipelines

Lecture: Pipelining Hazards

Lecture 10: ILP Innovations

Lecture 9: ILP Innovations

Lecture 9: Dynamic ILP Topics: out-of-order processors

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will be posted later today

2 Structural Hazards Example: a unified instruction and data cache  stage 4 (MEM) and stage 1 (IF) can never coincide The later instruction and all its successors are delayed until a cycle is found when the resource is free  these are pipeline bubbles Structural hazards are easy to eliminate – increase the number of resources (for example, implement a separate instruction and data cache)

3 Data Hazards

4 Bypassing Some data hazard stalls can be eliminated: bypassing

5 Example add $1, $2, $3 lw $4, 8($1)

6 Example lw $1, 8($2) lw $4, 8($1)

7 Example lw $1, 8($2) sw $1, 8($3)

8 Control Hazards Simple techniques to handle control hazard stalls:  for every branch, introduce a stall cycle (note: every 6 th instruction is a branch!)  assume the branch is not taken and start fetching the next instruction – if the branch is taken, need hardware to cancel the effect of the wrong-path instruction  fetch the next instruction (branch delay slot) and execute it anyway – if the instruction turns out to be on the correct path, useful work was done – if the instruction turns out to be on the wrong path, hopefully program state is not lost

9 Branch Delay Slots

10 Pipeline without Branch Predictor IF (br) PC Reg Read Compare Br-target PC + 4

11 Pipeline with Branch Predictor IF (br) PC Reg Read Compare Br-target Branch Predictor

12 Bimodal Predictor Branch PC 14 bits Table of 16K entries of 2-bit saturating counters

13 2-Bit Prediction For each branch, maintain a 2-bit saturating counter: if the branch is taken: counter = min(3,counter+1) if the branch is not taken: counter = max(0,counter-1) … sound familiar? If (counter >= 2), predict taken, else predict not taken The counter attempts to capture the common case for each branch

14 Slowdowns from Stalls Perfect pipelining with no hazards  an instruction completes every cycle (total cycles ~ num instructions)  speedup = increase in clock speed = num pipeline stages With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes Total cycles = number of instructions + stall cycles

15 Multicycle Instructions Multiple parallel pipelines – each pipeline can have a different number of stages Instructions can now complete out of order – must make sure that writes to a register happen in the correct order

16 An Out-of-Order Processor Implementation Branch prediction and instr fetch R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 Instr Fetch Queue Decode & Rename Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 T1 T2 T3 T4 T5 T6 Reorder Buffer (ROB) T1  R1+R2 T2  T1+R3 BEQZ T2 T4  T1+T2 T5  T4+T2 Issue Queue (IQ) ALU Register File R1-R32 Results written to ROB and tags broadcast to IQ

17 Title Bullet