1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

Slides:



Advertisements
Similar presentations
1 Lecture: Pipelining Hazards Topics: Basic pipelining implementation, hazards, bypassing HW2 posted, due Wednesday.
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
1 Lecture 3: Pipelining Basics Biggest contributors to performance: clock speed, parallelism Today: basic pipelining implementation (Sections A.1-A.3)
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Computer Architecture Pipeline. Motivation  Non-pipelined design  Single-cycle implementation  The cycle time depends on the slowest instruction 
Lecture 16: Basic Pipelining
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Lecture: Out-of-order Processors
Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards
Lecture: Pipelining Extensions
Lecture: Out-of-order Processors
Lecture 6: Advanced Pipelines
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture: Pipelining Hazards
Lecture 6: Static ILP, Branch prediction
Lecture 18: Pipelining Today’s topics:
Lecture: Static ILP, Branch Prediction
Lecture: Pipelining Hazards
Lecture 5: Pipelining Basics
Lecture 18: Pipelining Today’s topics:
Lecture: Branch Prediction
Lecture: Pipelining Hazards
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Lecture: Pipelining Extensions
Lecture 20: OOO, Memory Hierarchy
Lecture: Pipelining Extensions
Lecture 20: OOO, Memory Hierarchy
Lecture 18: Pipelining Today’s topics:
Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.
Lecture 4: Advanced Pipelines
Lecture: Pipelining Hazards
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
Lecture 9: Dynamic ILP Topics: out-of-order processors
Lecture 7: Branch Prediction, Dynamic ILP
Presentation transcript:

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will be posted later today

2 Structural Hazards Example: a unified instruction and data cache  stage 4 (MEM) and stage 1 (IF) can never coincide The later instruction and all its successors are delayed until a cycle is found when the resource is free  these are pipeline bubbles Structural hazards are easy to eliminate – increase the number of resources (for example, implement a separate instruction and data cache)

3 Data Hazards

4 Bypassing Some data hazard stalls can be eliminated: bypassing

5 Example add $1, $2, $3 lw $4, 8($1)

6 Example lw $1, 8($2) lw $4, 8($1)

7 Example lw $1, 8($2) sw $1, 8($3)

8 Control Hazards Simple techniques to handle control hazard stalls:  for every branch, introduce a stall cycle (note: every 6 th instruction is a branch!)  assume the branch is not taken and start fetching the next instruction – if the branch is taken, need hardware to cancel the effect of the wrong-path instruction  fetch the next instruction (branch delay slot) and execute it anyway – if the instruction turns out to be on the correct path, useful work was done – if the instruction turns out to be on the wrong path, hopefully program state is not lost

9 Branch Delay Slots

10 Pipeline without Branch Predictor IF (br) PC Reg Read Compare Br-target PC + 4

11 Pipeline with Branch Predictor IF (br) PC Reg Read Compare Br-target Branch Predictor

12 Bimodal Predictor Branch PC 14 bits Table of 16K entries of 2-bit saturating counters

13 2-Bit Prediction For each branch, maintain a 2-bit saturating counter: if the branch is taken: counter = min(3,counter+1) if the branch is not taken: counter = max(0,counter-1) … sound familiar? If (counter >= 2), predict taken, else predict not taken The counter attempts to capture the common case for each branch

14 Slowdowns from Stalls Perfect pipelining with no hazards  an instruction completes every cycle (total cycles ~ num instructions)  speedup = increase in clock speed = num pipeline stages With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes Total cycles = number of instructions + stall cycles

15 Multicycle Instructions Multiple parallel pipelines – each pipeline can have a different number of stages Instructions can now complete out of order – must make sure that writes to a register happen in the correct order

16 An Out-of-Order Processor Implementation Branch prediction and instr fetch R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 Instr Fetch Queue Decode & Rename Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 T1 T2 T3 T4 T5 T6 Reorder Buffer (ROB) T1  R1+R2 T2  T1+R3 BEQZ T2 T4  T1+T2 T5  T4+T2 Issue Queue (IQ) ALU Register File R1-R32 Results written to ROB and tags broadcast to IQ

17 Title Bullet