Pipelining - Branch Prediction

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Morgan Kaufmann Publishers The Processor
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Pipelining and Control Hazards Oct
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Abstraction Question General purpose processors have an abstraction layer fixed at the ISA and have little control over the compilers or code run on the.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
Branch Hazards and Static Branch Prediction Techniques
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Computer Organization CS224
Stalling delays the entire pipeline
Computer Structure Advanced Branch Prediction
Pipelining Chapter 6.
CS/COE 1541 (term 2174) Jarrett Billingsley
Computer Architecture Advanced Branch Prediction
Pipelining – Out-of-order execution and exceptions
CS5100 Advanced Computer Architecture Advanced Branch Prediction
Samira Khan University of Virginia Nov 13, 2017
Morgan Kaufmann Publishers The Processor
Chapter 4 The Processor Part 4
ECS 154B Computer Architecture II Spring 2009
CMSC 611: Advanced Computer Architecture
Pipelining review.
Pipelining Chapter 6.
The processor: Pipelining and Branching
So far we have dealt with control hazards in instruction pipelines by:
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Pipelining in more detail
Dynamic Branch Prediction
The Processor Lecture 3.6: Control Hazards
Branch Prediction: Direction Predictors
/ Computer Architecture and Design
Control unit extension for data hazards
So far we have dealt with control hazards in instruction pipelines by:
Instruction Execution Cycle
CS203 – Advanced Computer Architecture
Branch Prediction: Direction Predictors
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Conditionals and Functions
Pipelining (II).
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
Wackiness Algorithm A: Algorithm B:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Presentation transcript:

Pipelining - Branch Prediction CS/COE 1541 (term 2174) Jarrett Billingsley

Class Announcements Quizzes/exams reweighted again (last time FOR REAL) Quizzes 5% each (15% total) Exams 15% each (45% total) Short lecture, then quiz We'll talk about branch prediction today Which will be on the quiz :) First exam next week Wednesday, February 1st There will be a study guide – probably Wednesday? No homework for this week since exam is next week! 1/23/2017 CS/COE 1541 term 2174

But first... Improving Branch Penalties 1/23/2017 CS/COE 1541 term 2174

The problem Branches are based on comparisons. Which are done by... The ALU in the EX phase. Which means it take how many cycles to determine if a branch is taken? Three. So how many possibly-wrongly-fetched instructions are in the pipeline behind the branch, which may need to be flushed? Two. But what if our pipeline were longer? What if there were 4 decode phases and 3 execute phases? ugh. Therefore, we want to determine the branch direction (whether or not it's taken) as early as possible. What are some ways we could improve this situation? 1/23/2017 CS/COE 1541 term 2174

The solution: MORE SINKS!! (hardware) If we carefully design our instruction set (like the MIPS designers did), we can determine the branch direction early in decoding! In what phase do we read the registers to be compared? The ID phase. So what hardware can we put in the ID phase to let us determine the branch direction? A comparator! But... what's the downside to adding more hardware? Starting to notice a pattern, huh. So if we mispredict a branch in ID, how many instructions need to be flushed? Just one – the one in IF! 1/23/2017 CS/COE 1541 term 2174

Not all sunshine and rainbows Uh oh. Remember data hazards? Time 1 2 3 4 5 6 7 sub t0,t1,t2 beq t0,$0,end EX ID IF ID IF MEM ID WB EX WAIT! MEM WB Now we've added a forwarding path from EX to ID. And as you can imagine, we need one from MEM to ID too. Doing things in more stages means more forwarding paths. Something to keep in mind as you add more pipeline stages! 1/11/2017 CS/COE 1541 term 2174

Static Branch Prediction 1/23/2017 CS/COE 1541 term 2174

The compiler can help for(s0 = 0 .. 100) print(s0); printf("done"); In the following loop, what can you say about the blt instruction? for(s0 = 0 .. 100) print(s0); printf("done"); li s0, 0 top: move a0, s0 jal print addi s0, s0, 1 blt s0, 100, top la a0, done_msg jal printf In fact, the original version of MIPS had a special kind of branch for this: branch likely (high probability). l 1/11/2017 CS/COE 1541 term 2174

The old ways When MIPS was first designed, the idea was that the compiler could do the instruction scheduling/branch prediction in advance. The regular branch instructions assumed the branch was NOT taken, so the CPU would keep fetching instructions after them. The branch likely instructions assumed the branch WAS taken, so the CPU would start fetching instructions at the branch target. And this can be pretty effective for some control structures! Unfortunately, not effective enough... branch likely instructions are no longer part of the MIPS ISA. Many other "compiler-centric" features of MIPS have lost relevance over the years as well, such as inserting NOP instructions instead of forwarding/stalling dynamically. 1/23/2017 CS/COE 1541 term 2174

The CPU knows best Ultimately, for most programs, the compiler cannot statically predict their behavior to an acceptable degree. Solving the halting problem yadda yadda... The CPU can dynamically analyze program behavior at runtime, and adapt gracefully. Program behavior can change with user input after all! Implementing this analysis in hardware means the CPU architecture can change drastically without changing compilers and old code. It can also allow unoptimized programs to run quickly. We'll be learning about several adaptive execution schemes, starting with... 1/23/2017 CS/COE 1541 term 2174

Dynamic Branch Prediction 1/23/2017 CS/COE 1541 term 2174

Well let's try to turn this into hardware... The problem Some branches are taken 99% of the time, some 1% of the time, some always, some never, some 50% of the time, some randomly... What we need is hardware that can keep track of: Where branch instructions are in the program The probability that each branch is taken The branch target address of each branch Branch PC Probability Branch Target 0x007FA004 32% 0x007FA03C 0x007FA058 94% 0x007FA040 0x007FC380 88% 0x007FC398 0x007FC60C 12% 0x007FC704 Well let's try to turn this into hardware... 1/23/2017 CS/COE 1541 term 2174

Compromises How many branch instructions might there be in your program? How about in all programs running + the operating system? So how many entries should our prediction table have? One of those "try it and see" things – processor simulation is very useful in these cases. Law of diminishing returns. If we have n entries in our table, how can we quickly look up addresses in the table? (We're using this on every instruction!) Lots of comparators... lots of hardware. Hashing... but we could get false positives. If we predict incorrectly, what happens? Program runs a little slower, but nothing catastrophic. 1/23/2017 CS/COE 1541 term 2174

The Branch Target Buffer (BTB) Hash # Branch PC Pred. Branch Target 0x007FA004 NT 0x007FA03C 1 0x007FC60C 0x007FC704 2 0x007FA058 T 0x007FA040 3 ... 4 0x007FC380 0x007FC398 5 6 7 ==? T? PC: 0x007FA004 entry = Hash(PC) if(entry.PC == PC && entry.pred == T) NextPC = entry.target else NextPC = PC+4 This is to avoid false positives on non-branch (or wrong branch) instructions! 1/23/2017 CS/COE 1541 term 2174

When to read? When to write? Ideally, we'd like to start fetching instructions from the "correct" place during the branch instruction's ID phase. When should we check the BTB then? During IF! What but how does it even know it's a branch— Remember that the BTB checks that the instruction PC matches the BTB entry, so it MUST be a branch instruction.* When do we write to the BTB? Well when do we have all the information needed to fill in a BTB entry? After ID, when the branch target and direction have both been computed. (nice optimization!) This also handles adding new entries – only written on branches. *unless we have an incoherent instruction cache and dynamic code modification ;) 1/23/2017 CS/COE 1541 term 2174

Nobody's perfect What happens if, at the end of ID, we find our prediction is wrong? Flush and start fetching from correct PC. But now the BTB is updated with new info as well. (It's updated even if we predicted correctly, too.) Let's make our predictions more accurate. The scheme we showed here has only a single bit to predict taken/not taken. It's.. not much information to go on. But adding more bits means more hardware. Let's strike a balance. 1/23/2017 CS/COE 1541 term 2174

2-bit branch predictor We can use 2 bits with a state machine to make better predictions. Strongly Taken Weakly Taken Weakly Not Taken Strongly Not Taken Green arrows = taken Red arrows = not taken The hysteresis (have to make two mistakes before switching decisions) allows for intermittent changes in branch behavior. 1/23/2017 CS/COE 1541 term 2174

3 bits? 4? 10? Is it worth adding more bits to the prediction probability? Empirically... not really. 2 bits + large number of BTB entries gets you ~93% accuracy! More bits don't help because branch behavior can be complex. What does help with prediction accuracy is more complex branch prediction methods. Two-level adaptive predictors... Tournament predictors... Hybrid predictors... Loop detection... Return stack buffers... Oh and then there are indirect jumps (jr) which can be a whole different kind of pain to deal with. 1/23/2017 CS/COE 1541 term 2174

2-level adaptive predictors A common technique today: each entry in the BTB has multiple 2-bit counters, selected among by using a branch history. # Branch PC History Branch Target 0x007FA004 010 0x007FA03C 01 10 00 11 Every entry in the BTB has its own set of 8 counters! Every time the branch is taken/not taken, a 1/0 is shifted into the history on the right side. This way, the history keeps track of the last three times we encountered this branch. This kind of predictor can reach 97% accuracy! 1/23/2017 CS/COE 1541 term 2174

Return stack buffers jal someFunc beq v0, $0, blah ... someFunc: A function return is a special kind of indirect branch. jr $ra on MIPS or ret on x86 both get the address from somewhere else. Since functions return to where they were called virtually every time, it makes sense to cache the return address on function calls. When we encounter the jal, push the return address. 40CC00 46280C 4AB108 000000 jal someFunc beq v0, $0, blah ... someFunc: jr $ra 4AB33C 4AB340 When we encounter the jr $ra, pop the return address. Easy! 4AB340 Stack overflows aren't an issue. This is just a prediction, after all. 1/23/2017 CS/COE 1541 term 2174