TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble

Slides:

Advertisements

Similar presentations

Pipelining and Control Hazards Oct

Advertisements

1 SS and Pipelining: The Sequel Data Forwarding Caches Branch Prediction Michele Co, September 24, 2001.

EECS 470 Lecture 6 Branches: Address prediction and recovery (And interrupt recovery too.)

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.

Branch Prediction in SimpleScalar

EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.

Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.

EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Goal: Reduce the Penalty of Control Hazards

Branch Target Buffers BPB: Tag + Prediction

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)

Dynamic Branch Prediction

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.

Korea UniversityG. Lee CRE652 Processor Architecture Dynamic Branch Prediction.

Computer Structure Advanced Branch Prediction

CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

CDA 5155 Week 3 Branch Prediction Superscalar Execution.

COSC3330 Computer Architecture Lecture 14. Branch Prediction

Computer Organization CS224

CS203 – Advanced Computer Architecture

Computer Structure Advanced Branch Prediction

Computer Architecture Advanced Branch Prediction

CS5100 Advanced Computer Architecture Advanced Branch Prediction

ELEN 468 Advanced Logic Design

Constructive Computer Architecture Tutorial 6: Discussion for lab6

Pipelining - Branch Prediction

Morgan Kaufmann Publishers The Processor

Samira Khan University of Virginia Dec 4, 2017

Computer Architecture Lecture 3

Pipelining review.

The processor: Pipelining and Branching

Module 3: Branch Prediction

So far we have dealt with control hazards in instruction pipelines by:

Lecture: Static ILP, Branch Prediction

Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.

Lecture: Branch Prediction

/ Computer Architecture and Design

Pipelining and control flow

Control unit extension for data hazards

So far we have dealt with control hazards in instruction pipelines by:

Branch Prediction: Direction Predictors

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Recovery: Redirect fetch unit to T path if actually T.

Pipelining: dynamic branch prediction Prof. Eric Rotenberg

Adapted from the slides of Prof

Control unit extension for data hazards

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Control unit extension for data hazards

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Consider the following code segment for a loop: int a = 3, b = 4;

Computer Structure Advanced Branch Prediction

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble bubble bubble bubble bubble bubble bubble bubble bubble bubble I4 fetch decode exec mem wb I5 fetch decode exec mem wb Redirected fetch

TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble bubble bubble bubble bubble I2 fetch decode exec mem wb I3 fetch decode exec mem wb Redirected fetch

Predict PC + 4 Resolve if branch Resolve if non-branch TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 fetch decode exec mem wb I2 fetch decode exec mem wb I3 fetch decode exec mem wb I4 fetch decode exec mem wb I5 fetch decode exec mem wb

Predict PC + 4 Resolve next PC != PC + 4 TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb squashed I2 fetch decode bubble bubble bubble I3 fetch bubble bubble bubble bubble I4 fetch decode exec mem wb I5 fetch decode exec mem wb Redirected fetch

do { if (a[i] != 0) some computation i++; } while (i < 100); DOWHILE: load in r10 a[i] beq r10, r0, SKIP some computation SKIP: some computation addi r11, r11, 1 blt r11, r12, DOWHILE

Instruction opcode available Calculate Taken PC C1 C2 C3 Select PC+4 or Taken PC I1 FETCH fetch cache N S decode exec F? fetch decode

ITERATION 1. DOWHILE:. load in r10 a[i]. beq. r10, r0, SKIP ITERATION 1 DOWHILE: load in r10 a[i] beq r10, r0, SKIP FIRST TIME SEEN  PREDICT NOT TAKEN  LEARN NOT TAKEN some computation SKIP: some computation addi r11, r11, 1 blt r11, r12, DOWHILE FIRST TIME SEEN  PREDICT NOT TAKEN  MISPREDICTION  LEARN TAKEN ITERATION 2 DOWHILE: load in r10 a[i] beq r10, r0, SKIP SEEN BEFORE  PREDICT “SAME AS LAST TIME”: NOT TAKEN  LEARN NOT TAKEN some computation SKIP: some computation addi r11, r11, 1 blt r11, r12, DOWHILE SEEN BEFORE  PREDICT “SAME AS LAST TIME”: PREDICT TAKEN  LEARN NOT TAKEN

PC V N

ITERATION 1. DOWHILE:. load in r10 a[i] 0x100. beq. r10, r0, SKIP ITERATION 1 DOWHILE: load in r10 a[i] 0x100 beq r10, r0, SKIP some computation SKIP: addi r11, r11, 1 0x200 blt r11, r12, DOWHILE ITERATION 2 DOWHILE: load in r10 a[i] 0x100 beq r10, r0, SKIP some computation SKIP: addi r11, r11, 1 0x200 blt r11, r12, DOWHILE before after PC 0x100 1 PC Predict not taken (default) 0x100 1 PC 0x100 1 0x200 PC Predict not taken (default) 0x100 1 PC 0x100 1 0x200 PC Predict not taken (table) 0x100 1 PC 0x100 1 0x200 PC Predict taken (table)

Accuracy Accuracy = 100 + 98 / 200 = 99% Accuracy = # correct predictions / # all Predictions 100 beq  all not taken 100 blt  1 not taken at the end Predictions: Beq: all not taken – default Blt: first not taken wrong (default), last taken wrong Accuracy = 100 + 98 / 200 = 99%

How big this needs to be? PC V N 4G addresses, 4 bytes per instruction, aligned  1G possible branches 1G entries, each 4 bytes (PC), 2 bits (V & N) TOO LARGE

How big this needs to be? PC V N But if we had 1G entries we have 1-to-1 mapping of PC to entry: V N V N 1G V N No need for PC

PC PC V N V N V N V N 1G Few entries V N V N PC N N Few entries N h() 00 PC 00 V N h() V N V N V N 1G Few entries V N V N PC 00 h() N N Few entries N

PC Strongly NT Weakly NT Weakly T Strongly T T T T 00 01 10 11 01 NT T h() 00 01 10 11 01 NT T 01 NT NT NT 10

movi. r18, 3. # max i. movi. r19, 2. # max j. movi. r8, 0. #i = 0 DOi: movi r18, 3 # max i movi r19, 2 # max j movi r8, 0 #i = 0 DOi: movi r9, 0 # j = 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj #J branch addi r8, r8, 1 blt r8, r18, DOi # I branch T T NT T T NT T T NT T (11) T(11) T(10) T(11) T (11) T(10) T(11) T (11) T(10)

older younger history PC 00 0 0 h()

movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj history prediction Pattern learned PC 00 0 0 0 0 1 history prediction PC 00 1 0 1 0 1 history prediction PC 00 1 1 1 1

movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj history prediction Pattern learned PC 00 01 01 1 Learned thus far 1 0 1 1 1 0 1 1 0 0 1

movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj 1 0 1 1 0 1 1 0 0 Learned thus far history prediction PC 00 10 1 correct PC 00 11 correct

1 0 1 1 0 1 1 0 0 Learned thus far movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj movi r9, 0 DOj: some computation addi r9, r9, 1 blt r9, r19, DOj history prediction PC 00 01 1 history prediction PC 00 10 1 PC 00 11

PC bimodal Which is best for this branch? gshare

PC bimodal gshare meta

Overwriting Prediction Fast Prediction available C1 C2 C3 Overwriting Prediction fetch decode exec fetch decode

BTB PC TARGET ADDRESS V PC TARGET ADDRESS V PC TARGET ADDRESS V

PC PC+4 Next PC BTB Direction Predictor

Calls and returns

If (error != 0) error_handle(); If (a[i] < threshold) a++; else b++; Load a[i] in r8 blt r8, r9, THEN # r9 holds threshold ELSE: addi r10, r10, 1 # b++ br DONE THEN: addi r11, r11, 1 # a++ DONE: Load a[i] in r8 cmplt c0, r8, r9 # condition register c0 = r8 < r9 c0: addi r10, r10, 1 !c0: addi r11, r11, 1