Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University.

Slides:

Advertisements

Similar presentations

André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.

Advertisements

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

Pipelining and Control Hazards Oct

André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Dynamic Branch Prediction

Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Calvin Lin Dept. of Computer Science Rutgers University Univ. of Texas Austin Presented.

EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.

WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag.

VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.

EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Combining Branch Predictors

Branch Target Buffers BPB: Tag + Prediction

EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

Dynamic Branch Prediction

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO

Data Cache Prefetching using a Global History Buffer Presented by: Chuck (Chengyan) Zhao Mar 30, 2004 Written by: - Kyle Nesbit - James Smith Department.

Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.

Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.

Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.

Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.

Analysis of Branch Predictors

A STUDY OF BRANCH PREDICTION STRATEGIES JAMES E.SMITH Presented By: Prasanth Kanakadandi.

ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

Not- Taken? Taken? The Frankenpredictor Gabriel H. Loh Georgia Tech College of Computing MICRO Dec 5, 2004.

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.

Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

Prophet/Critic Hybrid Branch Prediction B B B

Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.

Samira Khan University of Virginia April 12, 2016

CSL718 : Pipelined Processors

Multiperspective Perceptron Predictor with TAGE

COSC3330 Computer Architecture Lecture 15. Branch Prediction

Samira Khan University of Virginia Nov 13, 2017

Samira Khan University of Virginia Dec 4, 2017

ECE 445 – Computer Organization

Looking for limits in branch prediction with the GTL predictor

Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.

So far we have dealt with control hazards in instruction pipelines by:

Phase Capture and Prediction with Applications

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

So far we have dealt with control hazards in instruction pipelines by:

Lecture 10: Branch Prediction and Instruction Delivery

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Adapted from the slides of Prof

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

rePLay: A Hardware Framework for Dynamic Optimization

So far we have dealt with control hazards in instruction pipelines by:

The O-GEHL branch predictor

Gang Luo, Hongfei Guo {gangluo,

Samira Khan University of Virginia Mar 6, 2019

Phase based adaptive Branch predictor: Seeing the forest for the trees

Presentation transcript:

Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University Depts. of Elec. Eng. & Comp. Sci. This research was funded by NSF Grant MIP

The Branch Prediction Problem 1 out of 5 instructions is a branch1 out of 5 instructions is a branch May require many cycles to resolveMay require many cycles to resolve –P4 has 20 cycle branch resolution pipeline –Future pipeline depths likely to increase [Sprangle02] Predict branches to keep pipeline fullPredict branches to keep pipeline full PC ComputeBranch resolution

Bigger Predictors = More Accurate Larger predictors tend to yield more accurate predictionsLarger predictors tend to yield more accurate predictions Faster cycle times force smaller branch predictorsFaster cycle times force smaller branch predictors Overriding predictor couples small, fast predictor with a large, multi-cycle predictor [Jiménez2000]Overriding predictor couples small, fast predictor with a large, multi-cycle predictor [Jiménez2000] –performs close to ideal large-fast predictor (but bigger predictors = slower)

Hybrid Predictors Wide variety of branch prediction algorithms availableWide variety of branch prediction algorithms available Hybrid combines more than one “stand-alone” or component predictor [McFarling93]:Hybrid combines more than one “stand-alone” or component predictor [McFarling93]: P1P1P1P1 P2P2P2P2Meta-Predictor Final Prediction

Multi-Hybrids P1P1P1P1 P2P2P2P2 PnPnPnPn Pr. Encoder … … … … Final Prediction P1P1P1P1 P2P2P2P2 M1M1M1M1 P3P3P3P3 P4P4P4P4 M2M2M2M2 M3M3M3M3 “Multi-Hybrid” [Evers96] “Quad-Hybrid” [Evers00]

Our Idea: Prediction Fusion P1P1P1P1 … … P2P2P2P2 P3P3P3P3 PnPnPnPnXXX Prediction Selection P1P1P1P1 … … P2P2P2P2 P3P3P3P3 PnPnPnPn Prediction Fusion

Early Attempt from ML Weighted Majority algorithm [LW94]Weighted Majority algorithm [LW94] –Better predictors get assigned larger weights –Make final prediction with larger sum Predictor with largest weight not always correctPredictor with largest weight not always correct P2P2P2P2 P6P6P6P6 P7P7P7P7 P1P1P1P1 P3P3P3P3 P4P4P4P4 P5P5P5P5 P8P8P8P8 P 2, P 6 and P 7 say “not-taken”P 1, P 3, P 4, P 5 and P 8 say “taken”

Outline COLT PredictorCOLT Predictor Choosing parameters and componentsChoosing parameters and components PerformancePerformance Prediction distributions, component choicePrediction distributions, component choice

COLT Organization Branch Address Branch History P1P1P1P1 P2P2P2P2 P3P3P3P3 PnPnPnPn 1010… … MappingTable VMT … Final Prediction

Pathological Example P1P1P1P1 P2P2P2P2 P3P3P3P Actual outcome = 1 (taken)

Example (cont’d) P1P1P1P1 P2P2P2P2 P3P3P3P Outcome is always wrong Selection: P1P1P1P1 P2P2P2P2 P3P3P3P Can recognize and remember this pattern 1 COLT: VMT

COLT Lookup Delay … P1P1P1P1 P2P2P2P2 PnPnPnPn Prediction time … MT Select critical delay

Design Choices # of branch address bits# of branch address bits # of branch history bits# of branch history bits # of components# of components Choice of componentsChoice of components –gshare, PAs, gskewed, … –History length, PHT size, … } Determines number of mapping tables } Determines size of individual MT’s

Predictor Components Global HistoryGlobal History –gshare [McFarling93] –Bi-Mode [Lee97] –Enhanced gskewed [Michaud97] –YAGS [Eden98] Local HistoryLocal History –PAs [Yeh94] –pskewed [Evers96] OtherOther –2bC (bimodal) [Smith81] –Loop [Chang95] –alloyed Perceptron [Jiménez02] } history lengths optimized on test data sets Total of 59 configurations Sizes vary up to 64KB

Huge Search Space 2 59 ways to choose components2 59 ways to choose components  ways to choose COLT parameters  ways to choose COLT parameters We use a genetic searchWe use a genetic search … bit-k = 0 means don’t include P k bit-k = 1 means do include P k VMT Size historylength gene format: …

Methodology SPEC2000 integer benchmarksSPEC2000 integer benchmarks –For tuning/optimization: 10M branches from test –For evaluation: 500M branches from train Skipped first 100M branchesSkipped first 100M branches –Compiled with cc –arch ev6 –O4 –fast –non_shared SimpleScalar simulatorSimpleScalar simulator –sim-safe for trace collection –MASE for ILP simulations

Genetic Search COLT Results NameSize(KB)ComponentsVMT Counter width History length 16 alpct(34/10) gskewed(12) gshare(8) 32 alpct(34/10) gshare(15) gshare(9) PAs(7) 64 alpct(40/14) gshare(16) YAGS(11) pskewed(6) 128 alpct(40/14) alpct(38/14) gshare(16) gskewed(13) YAGS(12) PAs(8) 256 alpct(50/18) alpct(34/10) gshare(18) Bi-Mode(16) gskewed(15) PAs(8)

Overall Predictor Performance

Per-Benchmark Performance

ILP Performance Simulated CPU:Simulated CPU: –6-issue –20 cycle pipeline –Same functional units, latencies, caches as Int e l P4/NetBurst microarchitecture 1-cycle2bC4-cycle OR alpct ++ 4-cycle OR COLT Ideal1-cycleCOLT

ILP Impact

COLT Parameter Sensitivity Mapping table counter widthsMapping table counter widths Number of mapping tablesNumber of mapping tables Number of history bits for VMT indexNumber of history bits for VMT index

Counter Width

VMT Size

History Length

Explaining Choice of Components Parameter sensitivity results shows GA performed well for the COLT parametersParameter sensitivity results shows GA performed well for the COLT parameters Why did it choose the component predictors that it did?Why did it choose the component predictors that it did?

Classifying COLT Predictions We examined the  (32KB) COLT config.We examined the  (32KB) COLT config. For each mapping table lookup, we examine the neighboring entries:For each mapping table lookup, we examine the neighboring entries: P1P1P1P1 P2P2P2P2 P3P3P3P3 P4P4P4P entry 0001 = NT entry 1001 = T entry 1101 = T

Classifying Predictions (cont’d) easy: all neighboring entries agree short: only gshare(9) distinguishes long: only gshare(14) distinguishes local: only PAs(7) distinguishes perceptron: only alpct(34/10) distinguishes multi-length: mix of gshare(9), (14) or alpct mixed: both global and local components gshare(9)gshare(14)PAs (7) alpct (34/10) 32KB COLT: Classes:

Prediction Classifications

Related Work/Issues Alloyed history [Skadron00]Alloyed history [Skadron00] Variable path history length [Stark98]Variable path history length [Stark98] Dynamic history length fitting [Juan98]Dynamic history length fitting [Juan98] Interference reduction [lots…]Interference reduction [lots…] COLT handles all of these cases* COLT handles all of these cases*  Doesn’t support partial update policies

Open Research Better individual componentsBetter individual components Augment with SBI [Manne99], agree [Sprangle97]Augment with SBI [Manne99], agree [Sprangle97] Better fusion algorithmsBetter fusion algorithms Hybrid fusion/selection algorithmsHybrid fusion/selection algorithms Other domains (branch confidence prediction, value prediction, memory dependence prediction, instruction criticality prediction, …)Other domains (branch confidence prediction, value prediction, memory dependence prediction, instruction criticality prediction, …)

Summary Fusion is more powerful than selectionFusion is more powerful than selection –Combines multiple sources of information Branch behavior is very variedBranch behavior is very varied –Need long, short, global and local histories, multiple simultaneous lengths and types of history COLT is one possible fusion-based predictorCOLT is one possible fusion-based predictor –Combines multiple types of information –Current “best” purely dynamic predictor*

Questions?