Download presentation
Presentation is loading. Please wait.
Published byHorace Morris Wade Modified over 9 years ago
1
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University
2
2 Situation Faster applications Parallelism – process level – parallel architectures – this talk focus on parallelism at instruction level As many instructions execute in parallel on a single processor
3
3 Pipelined architecture Phases during instruction execution Fetch = read next instruction Decode = analyze type and read operands Execute Write Back = write result FetchDecodeExecuteWrite Back R1=R2+R3 addition 43 computation R1 contains 7
4
4 Constant flow of instructions possible Limitations due to – data dependencies – control flow dependencies FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1R1=R2+R3R5=R2+1R4=R3-1R1=R2+R3R5=R2+1R4=R3-1R7=2*R1R5=R2+1R4=R3-1R7=2*R1R5=R6R4=R3-1R7=2*R1R5=R6R1=4 Pipelined architecture: example
5
5 Pipeline stalls: data dependencies FetchDecodeExecuteWrite Back R1=R2+R3R5=R6R5=R2+1R7=2*R1R5=R2+1R7=2*R1R1=R2+R3 R4=R2+1R7=2*R1R5=R6 Guarantee program correctness Introduce a bubble (wasted cycle) Solution – Reorder instructions – Value prediction
6
6 Control dependencies FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 test R1=0 R5=R2+1R5=R6 ? test R1=0R5=R2+1 ?? test R1=0 R7=2*R1 R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 test R1=0 yes no R2=R2-1 Branches determine program flow or execution path Introduce 2 bubbles
7
7 Speculative execution 1 out of 8 is a branch instruction Waiting for the outcome of branches seriously affects amount of parallelism Increasing number of pipeline stages – Pentium 4: up to 20 stages Speculate on outcome of branch
8
8 Speculative execution: example 1 FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 test R1=0 R5=R2+1R5=R6 R7=2*R1 test R1=0R5=R2+1 R7=2*R1R2=R2-1 R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 test R1=0 yes no R2=R2-1 Fetch those instructions that are likely to be executed Correct prediction eliminates stall
9
9 Speculative execution: example 2 FetchDecodeExecuteWrite Back test R1=0R5=R2+1R5=R6 R7=2*R1 test R1=0R5=R2+1 R7=2*R1R2=R2-1 R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 test R1=0 yes no R2=R2-1 Incorrect prediction or misprediction Penalty for cancelling instructions (3 bubbles) R7=0 test R1=0R7=2*R1 R2=R2-1
10
10 Branch prediction Prediction during fetch stage Outcome known typically a few cycles later Binary prediction: taken or not taken Based on generalization property Static versus dynamic Quality of prediction scheme – accuracy – hardware budget – delay
11
11 Static branch prediction Examples – Predict taken: branches tend to be taken – BTFNT: Backward Taken, Forward Not Taken Prediction accuracy: about 75% Based on static heuristics – Loops (e.g. for, while) – Type of branches (comparison with zero)
12
12 Dynamic branch prediction Bimodal prediction Two-level local prediction [Yeh&Patt] Gshare predictor [McFarling]... Hybrid predictors Perceptron predictor [Jiménez]
13
13 Bimodal predictor PC saturating counter e.g. 2 prediction e.g. taken update with outcome e.g. taken saturating counter e.g. 3
14
14 Accuracy versus hardware budget 75 80 85 90 95 100 10100100010000100000 Predictor Size (bytes) Prediction Accuracy (%) Bimodal
15
15 Two-level local predictor [Yeh&Patt] PC Local history e.g. 1111 prediction e.g. taken update with outcome e.g. not taken counter e.g. 3 1st level counter e.g. 2 Local history e.g. 1110 2nd level
16
16 Accuracy versus hardware budget 75 80 85 90 95 100 10100100010000100000 Predictor Size (bytes) Prediction Accuracy (%) Bimodal Two-level local
17
17 Gshare predictor [McFarling] PC counter e.g. 0 prediction e.g. not taken XOR Global history e.g. 1010 update with outcome e.g. not taken Global history e.g.0100 counter e.g. 0
18
18 Accuracy versus hardware budget 75 80 85 90 95 100 10100100010000100000 Predictor Size (bytes) Prediction Accuracy (%) Bimodal Two-level local Gshare
19
19 Realistic branch predictors Alpha 21264 – Hybrid local/global – Accuracy 90% with 3,6K AMD K6 – Gshare – Accuracy 95% with 8K Athlon – Variant on two-level scheme – 95% with 4K
20
20 Disadvantages classical schemes Prediction tables grow exponentially with information bits Limited table size introduces aliasing (mostly destructive) All use saturating counters
21
21 Machine learning Branch prediction can be seen as – Classification – Function approximation Typically problems for machine learning – Speech recognition – Image recognition – Wheather forecasting –... Neural networks
22
22 Perceptron predictor [Jiménez] Proposed in 2001 Linear separation... Prediction e.g. taken Global history e.g. -1-1-1…11 Weights e.g.1-12…13 Sum e.g. 2 update e.g. not taken Weights e.g. 203…02 Global history e.g. -1-1-1…1-1
23
23 Accuracy versus hardware budget Bimodal Two-level local Gshare Perceptron
24
24 Neural network features Hardware scales linearly with input information bits Design – Inputs – Topology network – Training algorithm Strategy difficult to understand (black box) Implementation
25
25 Selection of relevant inputs Can ignore irrelevant inputs by assigning small weights – 101=1; 001=1; 000=0; 100=0; – 111=1; 011=1; 010=1; 110=1 Large weight = strong correlation between particular input and prediction outcome zero weight
26
26 Topology and training Multi-layer networks Interconnection Training algorithms – Only on misprediction – Upon threshold – Keep on training inputs... Hidden layers prediction
27
27 From black box to grey box Classical schemes developed empirically – Easy to understand & good results Add this extra information Use domain specific information Higher prediction accuracies?
28
28 Implementation Delay – 1 cycle to make prediction – Access time to large prediction tables... – On-line neural networks fast enough? Strategy analysis – Partly interprete network behaviour – Try to implement in 1 cycle
29
29 Summary High accurate branch prediction is paramount in modern microprocessors Classical schemes up to 95% accuracy Machine learning gives new perspectives – Many design parameters – Alternative prediction – Excellent results with perceptrons
30
30 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.