Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor Y.Ishii, K.Kuroyanagi, T.Sawada, M.Inaba, and K.Hiraki.

Slides:



Advertisements
Similar presentations
André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
Advertisements

H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.
Pipelining V Topics Branch prediction State machine design Systems I.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Exploring Correlation for Indirect Branch Prediction 1 Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou Department of Electrical and Computer Engineering.
Computer Science Department University of Central Florida Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors Hongliang.
Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Branch Target Buffers BPB: Tag + Prediction
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA ICPP, Kaohsiung, Taiwan,
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Microbenchmarks and Mechanisms for Reverse Engineering of Branch Predictor Structures Vladimir Uzelac and Aleksandar Milenković LaCASA Laboratory Electrical.
Computer Architecture Lecture 26 Fasih ur Rehman.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Microbenchmarks and Mechanisms For Reverse Engineering Of Modern Branch Predictor Units Vladimir Uzelac Master’s Thesis.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
JILP RESULTS 1. JILP Experimental Framework Goal Simplicity of a trace based simulator Flexibility to model special predictors ( e.g., using data values)
Dynamic Branch Prediction
CSL718 : Pipelined Processors
Lecture: Out-of-order Processors
CS203 – Advanced Computer Architecture
Lecture: Branch Prediction
Dynamically Sizing the TAGE Branch Predictor
CMSC 611: Advanced Computer Architecture
Looking for limits in branch prediction with the GTL predictor
So far we have dealt with control hazards in instruction pipelines by:
Phase Capture and Prediction with Applications
Lecture: Branch Prediction
Scaled Neural Indirect Predictor
Dynamic Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
The O-GEHL branch predictor
Gang Luo, Hongfei Guo {gangluo,
Samira Khan University of Virginia Mar 6, 2019
Phase based adaptive Branch predictor: Seeing the forest for the trees
Lecture 7: Branch Prediction, Dynamic ILP
Spring 2019 Prof. Eric Rotenberg
Presentation transcript:

Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor Y.Ishii, K.Kuroyanagi, T.Sawada, M.Inaba, and K.Hiraki

Introduction  Indirect branches are categorized into two types  Monomorphic Branch  Takes only one target  Easy-to-predict  Polymorphic Branch  Takes multiple targets  Hard-to-predict  We have analyzed the balance between the monomorphic branch and the polymorphic branch

 Classifies workloads into 3 types Workload analysis in CBP3 Monomorphic Dominant Polymorphic Dominant Other workloads

For Your Interest,  In the paper, same chart uses different scale Monomorphic Dominant Polymorphic Dominant Other workloads

How to predict indirect branches Monomorphic Branch Polymorphic Branch Branch target buffer (BTB)Tagged Target Cache (TTC) BTB PC TTC PC target BHR + tag = targettag =

Adaptive Rehashing  Rehashable-BTB [Li+ 2002]  One tables are used to predict the branches.  Monomorphic branch  Use Only PC  Polymorphic branch  Use both PC & BHR BHR PC + BTB / TTC targettag =

Cascaded Predictor [Driesen+ 1998]  Multiple separated tables are used for predictions  Monomorphic branch  BTB-like base predictor  Polymorphic branch  TTC-like tagged predictor Target T0 Target BASE

ITTAGE Branch Predictor [Seznec+ 2006] Target T0 Target BASE Target T1 Target T2 Target T3 Target T4 The balance between BTB & TTC is fixed in design time. Wasting resource for monomorphic- or polymorphic-dominant workloads The balance between BTB & TTC is fixed in design time. Wasting resource for monomorphic- or polymorphic-dominant workloads

Our Proposal BCTAGE (Bimode Cascading ITTAGE)

BCTAGE  Key Idea  Dynamic reconfiguration to adapt the predictor to match the workload characteristics  Solution  Combines adaptive rehashing and cascaded predictor to improve the performance

Advantage of BCTAGE  Effective use of limited hardware resources  For strongly monomorphic-dominant workloads  For strongly polymorphic-dominant workloads

Adaptive Rehashing for ITTAGE  Bimode Component  BIM mode: monomorphic friendly (using only PC)  TAG mode: polymorphic friendly (using both PC & BHR)  Workload detector  Decides the workload characteristics dynamically  Tracks the capacity shortage of BTB-like resources TTC 50% TTC 50% BTB 50% BTB 50% TTC 75% TTC 75% BTB 25% BTB 25% TTC 90% TTC 90% BTB 10% BTB 10% Monomorphic-dominant Normal Workloads Polymorphic-dominant

T0BASET1T2T3T4 Bimode Cascading ITTAGE (BCTAGE) BIM Cascading Multiplexer TAG Cascading Multiplexer  Bimode components  Several normal tagged components are replaced  Two cascading MUXes  BIM Cascading MUXes  For monomorphic branch  TAG Cascading MUXes  For polymorphic branch  Switches data path and bimode component T0BASET1 Bimode T2T3 Bimode T4

For monomorphic-dominant workloads  All bimode components switch to BIM mode  Provides predictions for BIM Cascading MUXes  Blue boxes use only a PC to predict the branch  Resources for BTB-like predictions are increased T0BASET1 Bimode BIM T2T3 Bimode BIM T4 BTB-like prediction TTC-like prediction

For polymorphic-dominant workloads  All bimode components switches to TAG mode  Provides predictions for TAG Cascading Muxes  Yellow boxes use branch history to predict the branch  Resources for TTC-like predictions are increased T0BASET1 Bimode TAG T2T3 Bimode TAG T4 BTB-like prediction TTC-like prediction

For other workloads  Combines BIM mode and TAG mode  T1 (BIM mode)  Makes BTB-like predictions  T3 (TAG mode)  Makes TTC-like predictions T0BASET1 Bimode BIM T2T3 Bimode TAG T4 BTB-like prediction TTC-like prediction

The details of the bimode components  Designed for adaptive rehashing  Two input vectors  Only PC  Both PC and BHR  Select one input vector appropriate for current workload BHR Tag PC + Target = Bimode

The details of the bimode components  Designed for adaptive rehashing  Two hash function  Only PC  Both PC and BHR  Select one input vector appropriate for current workload BHR Tag PC + Target = Bimode BIM mode BIM mode: Using only PC BTB-like predictions BIM mode: Using only PC BTB-like predictions

The details of the bimode components  Designed for adaptive rehashing  Two hash function  Only PC  Both PC and BHR  Select one input vector appropriate for current workload TAG mode: Using both PC & BHR TTC-like predictions TAG mode: Using both PC & BHR TTC-like predictions BHR Tag PC + Target = Bimode TAG mode

Workload Detector  Responsible to decide the running mode  Check the capacity shortage of BTB-like resources  BTB-like resource = BASE / BIM-mode components  Tag miss in BTB-like resource is counted  Decide running modes  Not enough BTB-like resource is assumed that monomorphic-dominant workload  Enough BTB-like resource is assumed that polymorphic- dominant workload

Three modes supported by BCTAGE T0BASET1 BIM T2T3 TAG T4 T0BASET1 BIM T2T3 BIM T4 T0BASET1 TAG T2T3 TAG T4 Monomorphic-dominant Normal Workloads Polymorphic-dominant 1-base component 10-TAG components 9-BIM components 1-base component 10-normal tagged components 4-TAG bimode components 5-BIM bimode components 1-base component 10-normal tagged components 9-TAG bimode components

Implementation parameters  Prediction Components  The base predictor  Partially tagged BTB (8-bit tag, 5-way, 1280-entry)  The tagged components (13~23-bit tag)  10-normal tagged components  9-bimode components  Other storage  Global branch/path/target history register (length = 281)  Mode registers (2-bit)  Performance counters (16-bit x 2)

Performance impact of the rehashing  BCTAGE reduces MPKI/MPPKI in several workloads  Especially, monomorphic workloads CLIENT05SERVER01 BCTAGE ITTAGE ΔMPPKI7.1%0.6% CLIENT05SERVER01 BCTAGE ITTAGE ΔMPKI9.2%1.5% Miss penalties per 1000 instructionsMiss predictions per 1000 instructions  Total performance of BCTAGE outperforms ITTAGE

Related Work  Base Algorithm  Branch Target Buffer [Lee+ 1983]  Target Cache [Chang+ 1997]  Hybrid Strategy  Cascaded Predictor [Driesen+ 1998]  Rehashable BTB [Li+ 2002]  Novel implementation  ITTAGE [Seznec+ 2006]  VPC prediction [Kim+ 2007]

Summary  Bimode Cascading ITTAGE (BCTAGE)  Adaptive Rehashing for Cascaded Predictor  Uses one component for both polymorphic- and monomorphic- branches  Integrates multiple bimode components to improve the accuracy  Detects the workload characteristic dynamically and modify the predictor configuration.  Improves the state-of-the-art branch predictor  Effective use of the limited hardware resource

Q & A