Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Slides:



Advertisements
Similar presentations
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Advertisements

Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin,
EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
CMPE 421 Parallel Computer Architecture
Analysis of Branch Predictors
Computer Structure Advanced Branch Prediction
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Introduction to Computer Organization Pipelining.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
Ghent University Veerle Desmet Lieven Eeckhout Koen De Bosschere Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction.
Lecture: Out-of-order Processors
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Dynamic Branch Prediction
Computer Architecture Advanced Branch Prediction
COSC3330 Computer Architecture Lecture 15. Branch Prediction
Pipeline Implementation (4.6)
CDA 3101 Spring 2016 Introduction to Computer Organization
Samira Khan University of Virginia Dec 4, 2017
Exploring Branch Prediction
CMSC 611: Advanced Computer Architecture
TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble
So far we have dealt with control hazards in instruction pipelines by:
Address-Value Delta (AVD) Prediction
Phase Capture and Prediction with Applications
Lecture: Out-of-order Processors
Advanced Computer Architecture
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Pipelining and control flow
So far we have dealt with control hazards in instruction pipelines by:
Lecture 20: OOO, Memory Hierarchy
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Pipelining: dynamic branch prediction Prof. Eric Rotenberg
Adapted from the slides of Prof
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
rePLay: A Hardware Framework for Dynamic Optimization
So far we have dealt with control hazards in instruction pipelines by:
Phase based adaptive Branch predictor: Seeing the forest for the trees
Computer Structure Advanced Branch Prediction
Presentation transcript:

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium

3 Conditional Branches if (i > 0) /* something */ else /* something else */ for (i=0; i<50; i++) { /* a loop... */ } /* next statements */ How frequent do conditional branches occur? 1/8

4 Program Execution Fetch = take next instruction Decode = analyze type and read operands Execute Write Back = write result FetchDecodeExecuteWrite Back R1=R2+R3 addition 43 computation R1 contains 7

5 Pipelined architectures Parallel versus sequential: Constant flow of instructions possible Faster applications Limitation due to conditional branches FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1R1=R2+R3R5=R2+1R4=R3-1R1=R2+R3R5=R2+1R4=R3-1R7=2*R1R5=R2+1R4=R3-1R7=2*R1R5=R6R4=R3-1R7=2*R1R5=R6R1>0

6 Problem: Branches Branches introduce bubbles Affects pipeline throughput FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 if R1>0 R5=R2+1R5=R6 ? if R1>0R5=R2+1 ?? if R1>0 R7=2*R1 R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 if R1>0 else then R2=R2-1

7 Solution: Prediction Fetch those instructions that are likely to be executed FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 if R1>0 else then R2=R2-1 R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 if R1>0 R5=R2+1R5=R6 R7=2*R1 if R1>0R5=R2+1 R7=2*R1R2=R2-1 correct prediction = gain misprediction = penalty

8 Nowaday’s Architecture instruction cache fetchdecoderegister renamedispatch instruction window re- order logic functional unit register file IPC functional unit functional unit functional unit Branch predictor

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium

10 Bimodal Branch Predictor Predict outcome of condition e.g. if or else based on unique branch address Update prediction table k Branch address prediction table

11 Global History Branch Predictor k Global history prediction table Predict outcome of condition e.g. for loop based on global history Update prediction table and global history

12 Gshare Branch Predictor k Global history Branch address Original index prediction table [McFarling] XOR

13 Misprediction rate: gshare predictor size (bytes) misprediction rate SPEC INT 2000 better

14 Aliasing Resource limitations: 8 entries, index = 3 bits index 101 Two different branches using the same prediction information 3 bit index Index=101 B A prediction table

15 Aliasing SPEC INT 2000

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium

17 Basic Observations Branches with similar behavior can share prediction information Branches can use same table entry, e.g time

18 Time Varying Behavior % 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: A: B: C: D: phase NE = not executed phase

19 Branch Clustering Each branch represents a point in N-dim space Clusters formed by k- means algorithm A: B: C: D: 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33%

20 k-Means Cluster Algorithm X X 1. initial centers 2. calculate nearest center X X 4. Restart with new centers X X 3. redefine centers X X X X

21 k-Means Cluster Algorithm X X 1. initial centers Stable solution X X X X X X 2. calculate nearest centers 3. redefine centers

22 Determining k of k-Means k is chosen by BIC-score (Bayesian Information Criterion) Tradeoff between k and goodness of a clustering Stable solution with k=2 X X Stable solution with k=3 X X X best?

23 Branch Clustering SPEC INT 2000 from 8 to 33 clusters mcf: 8 gcc, parser: 33 Each branch belongs to exactly one cluster 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium

25 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index Index = 1Cluster prediction table

26 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index Index = 1Cluster prediction table

27 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index to 6 bits for cluster [SPECint2000] can be used in every predictor scheme 3 Index = 1Cluster prediction table

28 Subtables for Bimodal Cluster Branch addr prediction table

29 Subtables for Gshare Cluster Branch addr prediction table Global history 19% better for SMALL predictors

30 Why Clustered Indexing Works Subtabling Uses smaller predictors More aliasing expected… but More constructive aliasing

31 Hashing: Alternative to Subtables Keeps original global history length Global history Gshare ix index Branch addr Cluster prediction table

32 Hashing for Gshare predictor size (bytes) misprediction rate gshare original gshare clustered: subtables gshare clustered: hashed 3,5 4 4,5 5 5,5 6 6,5 7 7, predictor size (bytes) misprediction rate gshare original gshare clustered: subtables gshare clustered: hashed 5% better for LARGE predictors

33 Self Profile-Based Clustering Limit study Identified clusters optimal for given execution 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster

34 Cross Profile-Based Clustering 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster SELF 90% 10% 100% 60% NE NE 100% 25% NE NE NE NE 100% 33% 0% 0% 10% 20% A: B: C: D: E: Cluster SPEC-train inputs OK Cluster additional cluster for unseen branches Cluster

35 Cross Profile-Based Clustering predictor size (bytes) misprediction rate bimodal original bimodal self clustered bimodal cross clustered predictor size (bytes) misprediction rate gshare original gshare self clustered gshare cross clustered cross clustered still good small budgets: subtables 12.3% less mispredictions (19% self large budgets: hashing 3% better (5% self clustered)

36 Conclusion Small branch predictors suffer from aliasing frequently destructive Exploit constructive aliasing by clustering branches Implementation subtables (can be used in all branch prediction schemes) hashing (specific for gshare) Gshare misprediction 1KiB: reduced by 19% (self), 12.3% 256KiB: reduced by 5% (self), 3% (cross)

Questions?

The End