Exploring Correlation for Indirect Branch Prediction 1 Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou Department of Electrical and Computer Engineering.

Slides:



Advertisements
Similar presentations
Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor Y.Ishii, K.Kuroyanagi, T.Sawada, M.Inaba, and K.Hiraki.
Advertisements

André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.
Branch prediction Titov Alexander MDSP November, 2009.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
School of Electrical Engineering and Computer Science University of Central Florida Combining Local and Global History for High Performance Data Prefetching.
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University.
A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.
TAGE-SC-L Branch Predictors
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)
Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
Power Savings in Embedded Processors through Decode Filter Cache Weiyu Tang, Rajesh Gupta, Alex Nicolau.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Branch Target Buffers BPB: Tag + Prediction
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
Dynamic Branch Prediction
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
ECE/CSC Yan Solihin 1 An Optimized AMPM-based Prefetcher Coupled with Configurable Cache Line Sizing Qi Jia, Maulik Bakulbhai Padia, Kashyap Amboju.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
Microbenchmarks and Mechanisms for Reverse Engineering of Branch Predictor Structures Vladimir Uzelac and Aleksandar Milenković LaCASA Laboratory Electrical.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.
Computer Structure Advanced Branch Prediction
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
JILP RESULTS 1. JILP Experimental Framework Goal Simplicity of a trace based simulator Flexibility to model special predictors ( e.g., using data values)
Value Prediction Kyaw Kyaw, Min Pan Final Project.
Dynamic Branch Prediction
CSL718 : Pipelined Processors
Lecture: Out-of-order Processors
COSC3330 Computer Architecture
Computer Organization CS224
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Dynamic Branch Prediction
Computer Architecture Advanced Branch Prediction
Multiperspective Perceptron Predictor with TAGE
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.
Exploring Value Prediction with the EVES predictor
Looking for limits in branch prediction with the GTL predictor
Module 3: Branch Prediction
TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble
Lecture: Branch Prediction
Scaled Neural Indirect Predictor
Dynamic Branch Prediction
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Lecture 10: Branch Prediction and Instruction Delivery
TAGE-SC-L Again MTAGE-SC
Adapted from the slides of Prof
Dynamic Hardware Prediction
The O-GEHL branch predictor
Presentation transcript:

Exploring Correlation for Indirect Branch Prediction 1 Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou Department of Electrical and Computer Engineering North Carolina State University

Baseline: IITAGE Indirect Branch Predictor [A. Seznec and P. Michaud, JILP 2006] A PPM-based predictor contains multiple Markov predictors with each capturing different history length and the one with the longest match will be used to make prediction. 2

Our Main Idea:  Longest history length vs. adaptive history lengths.  Address-target correlation. 3

Predictor Structure – Main Predictor Tag T1 u Target Alt Tagu Target Alt Tagu Target Alt Tagu Target T2 T3Tn … T1_Match T2_Match T1,2_Match T3_Match T1,n-1_Match Tn_Match T1_Match T2_Match Tn_Match Target Prediction HBT hit … hlen …

Main Predictor at Fetch stage ITTAGE as the baseline predictor (no T0) Two ways to adaptively select the proper table (or history length) 1. Alt bit in each entry (except T1) 2. A separate table for hard-to-predict branches 5 tagualttarget

Alt = 0, target from the current entry is preferred for the prediction. Alt = 1, a table with shorter history is to be used to make the final prediction. No alt bit for the table T1. Initially alt field is set to zero. Update mechanism:  If table with the longest match fails to make correct prediction while another table does, the alt field will be set for those entries with longer history lengths. 6 Using Alt bits to select a table

Hard-to-predict Branch Table (HBT) A cache like set associative structure with entry containing a tag, a misprediction counter (mc) and a history length (hlen). HBT updated based on the prediction provided by longest history mc field is used for replacement to allow hard to predict branches to be captured by HBT. hlen is used to select the hlen th longest history. 7 tagmchlen

For example, if hlen = 2 and T2, T4 and T5 have tag matches and their corresponding alt fields are false then T2 will be selected for prediction. The main predictor provides prediction at fetch stage. The main predictor is updated at retire stage of an indirect branch. 8 Hard to predict Branch table (HBT)

Auxiliary Predictor at AGEN stage Correlation between producer load address and consumer branch target, e. g., Load R19 = Mem [R3] //Address: 0x x60846ec8 Br R19 //Target: 0x60751a64 0x607691c9 Producer load accesses two addresses with each address providing a different branch target. As long as data structures at these addresses do not change frequently, they are sufficient to predict branch target of consumer indirect branch. 9

tag Br pc Hashed load address Auxiliary Predictor Design

Address Target Correlation (ATC) is captured using Address Target Table (ATT). Accessed at agen stage of load instruction. PC of indirect branch used for tag match. Hashed load address is used to find matching address-target pair. Updated at the EXE stage of an indirect branch LRU replacement policy. Reduces misprediction penalty in case the prediction differs from the one provided at fetch stage. 11 Auxiliary Predictor Design tag Br pc Hashed load address

Storage Cost (1/2) Tagged table entry  U ctr: 2 bits  Target: 32 bits  Alt: 1 bit (except T1)  Tag: partial tag HBT (1,216 bits)  32 entries  Tag: 32 bits  mc: 2 bits  hlen: 4 bits ATT (11,882 bits)  26 entries  Tag: 32 bits  Lru: 5 bits  : bits 12

Global history – 640 * 2 bits Path history – 16 bits Other counters – 39 bits Total – KB 13 Storage Cost (2/2)

Experimental Results Overall performance improvements (ATT 11,882 bits)– 15.6% Performance improvements with small ATT (1,624 bits) – 14.8% 14

1. Other contestants are doing superb! 2. Our baseline ITTAGE is not well tuned. The code and the predictor structure is modified based on L-TAGE Discussion: Why we may not win

Our main ideas, adaptive history length and address-target correlation, can further improve well-tuned predictors. Discussion: Why we can win

Conclusions Although control flow history carries correlation to targets, the strength of correlation may either increase or decrease for different indirect branches when we increase the history length. There exists strong correlation between producer load addresses and consumer branch targets. 17

Thank You 18