Combining Branch Predictors

Slides:

Advertisements

Similar presentations

Pipelining V Topics Branch prediction State machine design Systems I.

Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

Dynamic Branch Prediction

Copyright 2001 UCB & Morgan Kaufmann ECE668.1 Adapted from Patterson, Katz and Culler © UCB Csaba Andras Moritz UNIVERSITY OF MASSACHUSETTS Dept. of Electrical.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)

W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.

1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.

EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

Branch Target Buffers BPB: Tag + Prediction

EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

Branch Prediction Dimitris Karteris Rafael Pasvantidιs.

CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998.

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)

CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO

Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.

Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.

1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.

1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.

Low Power Cache Design M.Bilal Paracha Hisham Chowdhury Ali Raza.

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

Korea UniversityG. Lee CRE652 Processor Architecture Dynamic Branch Prediction.

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

Copyright 2016 Csaba Andras MoritzECE668 Power Aware Branching.1 Few slides adapted from Patterson, et al © UCB and Morgan Kaufmann Csaba Andras Moritz.

1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Samira Khan University of Virginia April 12, 2016

Lecture: Out-of-order Processors

COSC6385 Advanced Computer Architecture Lecture 9. Branch Prediction

CS203 – Advanced Computer Architecture

Lecture: Branch Prediction

UNIVERSITY OF MASSACHUSETTS Dept

Lecture: Branch Prediction

CS5100 Advanced Computer Architecture Advanced Branch Prediction

COSC3330 Computer Architecture Lecture 15. Branch Prediction

Samira Khan University of Virginia Dec 4, 2017

15-740/ Computer Architecture Lecture 25: Control Flow II

Lecture 6: Static ILP, Branch prediction

So far we have dealt with control hazards in instruction pipelines by:

Lecture: Static ILP, Branch Prediction

Lecture: Branch Prediction

So far we have dealt with control hazards in instruction pipelines by:

Lecture 10: Branch Prediction and Instruction Delivery

Lecture 20: OOO, Memory Hierarchy

Lecture 20: OOO, Memory Hierarchy

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Pipelining: dynamic branch prediction Prof. Eric Rotenberg

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

The O-GEHL branch predictor

Gang Luo, Hongfei Guo {gangluo,

Samira Khan University of Virginia Mar 6, 2019

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

Combining Branch Predictors CS 7960-4 Lecture 7 Combining Branch Predictors Scott McFarling WRL Tech. Report TN-36 1993

Bimodal Branch Prediction Identifies most popular prediction in recent past Updates happen during commit 1 PC 10-bit index 1024 entries 2-bit saturating counters

Results SPEC’89 programs simulated for 10M instrs (modern studies use hard-to-predict programs) A larger predictor reduces contention for counters Prediction rates saturate at 93.5% (at 2K bytes) (Fig.3)

Local Predictors Two-Level predictor: The first level has history, the second level has saturating counters History gets updated immediately 1 1 1 PC 1 10-bit index 16 entries 1024 entries 2-bit saturating counters 4-bit history table

Results For small predictors, there could be contention at both levels, resulting in inaccurate predictions Will also take longer to warm up – after every context switch Does very well for large predictors – saturates at 97.1%

Global Predictors A single history register – neighboring branches have correlated results However, the PC is not used 1 1024 entries 10-bit global history 2-bit saturating counters

Do We Need PC? Note that the global history reveals which branch is being examined Hence, it outdoes bimodal predictors when the transistor budget is large (Fig.7) Local predictor does better – it is more important to identify the PC and local history than behavior of neighboring branches

Gselect Use a combination of PC and global history Bimodal and global prediction are special cases (Fig.9) 1 n PC / n+m / / 1024 entries m 5-bit global history 2-bit saturating counters

GShare Xor-ing 10 history bits and 10 PC bits has more info than the concatenation of 5 bits of each and more info than each individual component Branch Address Global History Gselect 4/4 Gshare 8/8 00000000 00000001 11111111 11110000 10000000 01111111 01111110 00000001 11100001 01111111

Terminology GAG: Global history indexes into global array of saturating counters PAG: Per-address history indexes into global array GAP: Global history indexes into each PC’s private array of counters (gselect) PAP: Per-address history indexes into each PC’s private array of counters

Trade-Offs Some predictors warm-up faster than others Some programs benefit from global history, some from local history Some programs have branches that interfere with each other Note that a 64KB local predictor has fewer saturating counters than a 64KB bimodal predictor – the former won’t be better for every program

Combining Predictors Use an array of saturating counters to pick the best available predictor for each PC Predictor A 1 PC 1024 entries Predictor B 2-bit saturating counters

Results The combination of local and gshare increases the prediction accuracy to 98.1% (Fig.16) For smaller transistor budgets, the combination of bimodal and gshare is better (gshare is twice the size to make sure the total is a power of two) A 1KB combined predictor does as well as a 16KB gselect predictor

Future Work Detect conflicts, correlations, and common predictions through profiling/compiler analysis Functions that compress information in history or PC Pipeline predictions – predict two branches ahead Hierarchical predictors – get a quick prediction in a cycle and a more accurate one two cycles later

Next Week’s Paper “Design Trade-Offs for the Alpha EV8 Conditional Branch Predictor”, Seznec et al., ISCA’02

Title Bullet