Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Slides:



Advertisements
Similar presentations
Pipelining and Control Hazards Oct
Advertisements

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Pipeline Control Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
CS 152 L12 RAW Hazards, Interrrupts (1)Fall 2004 © UC Regents CS152 – Computer Architecture and Engineering Lecture 12 – Pipeline Wrap up: Control Hazards,
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Lecture 5 Branch Prediction (2.3) and Scoreboarding (A.7)
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Goal: Reduce the Penalty of Control Hazards
Chapter 6 Pipelining to Increase Effective Computer Speed.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
Dynamic Branch Prediction
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Control Hazards.1 Review: Datapath with Data Hazard Control Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Abstraction Question General purpose processors have an abstraction layer fixed at the ISA and have little control over the compilers or code run on the.
Chapter 4B: The Processor, Part B. Review: Why Pipeline? For Performance! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
Computer Architecture and Design – ELEN 350 Part 8 [Some slides adapted from M. Irwin, D. Paterson. D. Garcia and others]
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
Branch Hazards and Static Branch Prediction Techniques
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
CSIE30300 Computer Architecture Unit 06: Containing Control Hazards
Designing a Pipelined Processor
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Computer Organization CS224
Stalling delays the entire pipeline
CS203 – Advanced Computer Architecture
Single Clock Datapath With Control
Appendix C Pipeline implementation
Morgan Kaufmann Publishers The Processor
Chapter 4 The Processor Part 4
ECS 154B Computer Architecture II Spring 2009
ECE232: Hardware Organization and Design
Chapter 4 The Processor Part 3
Morgan Kaufmann Publishers The Processor
Pipelining review.
The processor: Pipelining and Branching
Current Design.
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Pipelining in more detail
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
CSC3050 – Computer Architecture
Pipelining (II).
Control unit extension for data hazards
Control unit extension for data hazards
Presentation transcript:

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 2 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Instructions Cause Control Hazards I n s t r. O r d e r lw Inst 4 Inst 3 beq ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg FDEXMW FD MW jr

ECE232: BrPredict 3 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren BEQ resolved during the MEM stage PCSrc Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data ALU Shift left 2 Add Data Memory Address Write Data Read Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control Branch

ECE232: BrPredict 4 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren stall One Way to “Fix” a Control Hazard I n s t r. O r d e r beq ALU IM Reg DMReg lw ALU IM Reg DMReg ALU Inst 3 IM Reg DM Fix branch hazard by waiting – introduce stalls

ECE232: BrPredict 5 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Reducing branch penalty through HW design

ECE232: BrPredict 6 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Reducing Control Hazards’ Penalties  Stalls – hurts performance  Deeper pipelines have higher penalties  1. Move decision point as early in the pipeline as possible – reduces number of stalls at the cost of additional hardware  2. Delay decision (requires compiler support) – “Delayed Branch”: not effective for deeper pipes - requiring more than one delay slot to be filled  3. Predict outcome of branch beq $1,$2,NEXT add $4,$3,$5 sub $7,$2,$8  NEXT

ECE232: BrPredict 7 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Prediction  Easiest - static prediction Always taken, always not taken Opcode based Displacement based (forward not taken, backward taken) Compiler directed (branch likely, branch not likely)  Dynamic prediction – prediction per branch in program 1 bit predictor – remember last taken/not taken per branch Use a branch-history table (BHT) with 1 bit entry Use part of the PC (low-order bits) to index table – Why? Multiple branches may share the same bit Invert the bit if prediction is wrong Predictor 0 Predictor 127 Predictor 1 Branch PC BHT

ECE232: BrPredict 8 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Prediction  1 bit predictor Backward branches for loops will be mispredicted twice EX: If a loop branches 9 times in a row and not taken once, what is the prediction accuracy? Misprediction at the first and last loop iteration => 80% prediction accuracy, although branch is taken 90%  Modern processors – multiple instructions issued per cycle, more branch hazards will occur per cycle Cost of branch mispredicted goes up Pentium II – 3 instructions issued per cycle, 12+ cycle misprediction penalty Huge penalty for a misfetched path following a branch T... TTT T N TT... N

ECE232: BrPredict 9 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren 2-bit Branch Prediction  4 states instead of 2, allowing for more information about tendencies  A prediction must miss twice before it is changed  Good for backward branches of loops  2-bit saturating counter T T N T N T N N Predict Taken Predict not taken T... TTT T T TT... N

ECE232: BrPredict 10 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch History Table - BHT 01 BHT branch PC  2 bits by N (e.g. 4K entries)  Uses low-order bits of branch PC to choose entry  Plot misprediction instead of prediction Predictor 0 Predictor 4095 Predictor 1 01

ECE232: BrPredict 11 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Is Branch Predictor Enough?  When is using branch prediction beneficial? Clearly when the outcome is known later than the target Otherwise - If we predict the branch is taken (and suppose it is correct), what is the target address? Need a mechanism to provide target address as well Use a Branch Target Buffer (BTB) that includes the target address  Can we eliminate the one cycle delay for the 5-stage pipeline? Need to fetch from branch target immediately after branch was fetched

ECE232: BrPredict 12 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Target Buffer (BTB) BTB is a cache that contains the predicted PC value instead of whether the branch will take place or not (Ex. Loop address) Is the current instruction a branch ? BTB provides the answer before the current instruction is decoded and therefore enables fetching to begin after IF-stage (for branch) What is the branch target ? BTB provides the branch target if the prediction is a taken branch (for not taken branches the target is simply PC+4 )

ECE232: BrPredict 13 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren BTB

ECE232: BrPredict 14 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren BTB operations  BTB hit, prediction taken → 0 cycle delay  BTB hit, misprediction ≥ 2 cycle penalty – Correct BTB  BTB miss, branch ≥ 1 cycle penalty (Detected at the ID stage and entered in BTB) Taken Branch? Entry found in branch- target buffer? Send out predicted PC Is instruction a taken branch? Send PC to memory and branch-target buffer Enter branch instruction address and next PC into branch-target buffer Mispredicted branch, kill fetched instruction; restart fetch at other target; update target buffer Normal instruction execution Branch correctly predicted; continue execution with no stalls No Yes No ID IF EX

ECE232: BrPredict 15 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Prediction Summary  The better we predict, the lower penalty we might incur  2-bit predictors capture tendencies well  Correlating predictors improve accuracy, particularly when combined with 2-bit predictors  Accurate branch prediction does no good if we don’t know there was a branch to predict  BTB identifies branches in IF stage  BTB combined with branch prediction table identifies branches to predict, and predicts them well