FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.

Slides:



Advertisements
Similar presentations
Dead Block Replacement and Bypass with a Sampling Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.
Advertisements

H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.
Branch prediction Titov Alexander MDSP November, 2009.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University.
A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.
TAGE-SC-L Branch Predictors
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.
EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
Goal: Reduce the Penalty of Control Hazards
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Branch Target Buffers BPB: Tag + Prediction
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Dynamic Branch Prediction
Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Temporal Stream Branch Predictor (TS Predictor) Yongming Shen, Michael Ferdman.
Prophet/Critic Hybrid Branch Prediction B B B
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
FAT predictor Sabareesh Ganapathy, Prasanna Venkatesh Srinivasan, Maribel Monica.
Value Prediction Kyaw Kyaw, Min Pan Final Project.
Dynamic Branch Prediction
CSL718 : Pipelined Processors
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Dynamic Branch Prediction
Computer Architecture Advanced Branch Prediction
Dynamically Sizing the TAGE Branch Predictor
CS 704 Advanced Computer Architecture
Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Looking for limits in branch prediction with the GTL predictor
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
Module 3: Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Hardware Branch Prediction
Phase Capture and Prediction with Applications
Lecture: Branch Prediction
Dynamic Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
TAGE-SC-L Again MTAGE-SC
Serene Banerjee, Lizy K. John, Brian L. Evans
Adapted from the slides of Prof
Dynamic Hardware Prediction
Patrick Akl and Andreas Moshovos AENAO Research Group
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Phase based adaptive Branch predictor: Seeing the forest for the trees
Spring 2019 Prof. Eric Rotenberg
Presentation transcript:

FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture I University of Wisconsin–Madison {cbodden, skeehan, bzhang93}@wisc.edu

Overview and Motivation Achieving high ILP is difficult for real programs because of conditional execution. (Flynn’s bottleneck, Tjaden 1970) High ILP can be achieved if conditional execution can be removed or predicted accurately. (Riseman 1972) While current state-of-the-art approaches are extremely accurate (~99%), room for improvement remains. In this project we consider whether frequency analysis can improve state-of-the-art prediction performance.

Previous Work TAGE (Seznec 2006) Idea: Use increasingly longer tags (PC xor Path Hist) to predict branch outcomes. Includes a base predictor as a fallback. Problem: The higher history length tables are used infrequently and have high overhead. This overhead could potentially be used more efficiently! FAB (Kampe 2002) Idea: Use frequency analysis to predict branches that a main branch predictor has troubles with (typically 50-50 outcome branches). Problem: FAB uses a program profiler to completely analyze an entire trace before execution. This is impractical!

Plan A: Try to predict TAGE misses

Predict TAGE Misses Why? The overhead for performing frequency domain analysis is high. TAGE is very accurate. We only want to consider when it misses often. So we can consider TAGE as a filter for the information we want. How? Store TAGE misses in a TAGE miss history buffer. Analyze miss history using Fourier analysis. Predict ~TAGE when a misprediction is expected. Implement this approach either globally or per a branch.

Global TAGE Miss History Design Maintain a large global TAGE miss history buffer of M bits. When the buffer fills, perform a discrete Fourier transform and take the K frequencies with the largest amplitude. If TAGE predicted incorrectly and an override outcome is correct, increment a usefulness counter, u. If TAGE predicted incorrectly and an override outcome is incorrect decrement the counter. (Only count on valuable cases.) When the usefulness exceeds a threshold and a TAGE miss is predicted, negate the TAGE prediction. When the usefulness drops below -threshold perform a discrete Fourier transform and update the sine table.

Global TAGE Miss History Results FA-TAGE using 10 TAGE tables and global TAGE miss history has almost identical performance to a 10 table TAGE predictor. Almost no overrides. Override accuracy is under 50%. Global TAGE misses don’t seem to have any frequency component. We are predicting noise. Maybe per a branch TAGE misses will have more information?

Branch TAGE Miss History Design Use the same basic design as global TAGE miss history, but track TAGE misses and sine tables per each branch. Index the TAGE miss history table using the branch PC. Use shorter miss histories. An age counter can be used for replacement.

Branch TAGE Miss History Results FA-TAGE using 10 TAGE tables and per branch TAGE miss history has worse or almost identical performance to a 10 table TAGE predictor. More overrides at lower history lengths, but still not many. Override accuracy is still under 50% and inversely related to number of overrides. Per branch TAGE misses don’t seem to have any frequency component either. Once again we are predicting noise.

TAGE Miss History Conclusions TAGE miss history, whether at the global or branch level, is not meaningful in the frequency domain. TAGE misses are non-deterministic and the miss history appears as noise. Very few overrides of TAGE were actually used. TAGE overrides were not accurate when used. … on to Plan B.

Plan B: Try to predict branch histories and switch

Predict Branch Outcomes Why? TAGE miss history is not effective for frequency domain analysis. Branch outcome histories may contain frequency information. How? Store branch outcomes in a history buffer. Analyze branch history using Fourier analysis. Predict TAGE or FAB based on a switching mechanism. Implement this approach on a per a branch.

Branch History Design (simple switching) Use the same design as the per branch TAGE miss history override predictor. Replace TAGE miss history with branch outcome history. Once the usefulness counter reaches a high threshold (the FAB predictor is predicting the branch accurately) use the FAB prediction for this branch. Continue to monitor the usefulness and update the branch’s sine table if it loses usefulness.

Branch History (simple switching) Results Using branch history with a simple threshold switch finally achieves meaning levels of overriding TAGE. Override accuracy is over 90%. The branch histories do have meaningful frequency components. However, 90% accuracy is still worse than TAGE. Many overrides hurts the MPKI. Over 2x worse MPKI than a 10 table TAGE predictor. Worse than a 32 KB Gshare predictor. Poor performance, but establishes that branch history has frequency information and now our issue is the switching mechanism.

Branch History Design (“dueling” switching) Replace usefulness with two values: TAGE usefulness and FAB usefulness. These are saturating counter from 0 to N. Add two length N circular buffers, one for TAGE and one for FAB. After each branch resolves, update the TAGE/FAB usefulness counters by subtracting the oldest TAGE/FAB buffer entry and removing this entry from the buffers. Add the most recent TAGE/FAB outcome into the buffers and add the outcome to the TAGE/FAB usefulness counters. Use the largest TAGE/FAB usefulness to choose the predictor to use for this branch and predict. If FAB usefulness falls below a threshold, update sine table.

Branch History (“dueling” switching) Results Using branch history with dueling switching achieves appropriate levels of overriding TAGE to notice improvements. Override accuracy is over 99%. Finally we have achieved an accuracy high enough almost always override correctly. This override percentage finally reduces the average MPKI, however the reduction is insignificant. (.3%) from the 10 table TAGE predictor. Interestingly, frequency information is captured better by using smaller branch history lengths. We expect that the frequency information changes often enough that this is better captured by a shorter history.

More Branch History (“dueling” switching) Results Analysis of space used: ~5000 entries x (64 bits [branch history]+ 256 bits [TAGE hit history] + 256 bits [FAB hit history]) = 2880000 bits = 360 kB. This is clearly impractical and a smart implementation would have a replacement policy to drop underused FAB entries. In particular, when the age of a certain entry goes above a threshold value, we could drop it from the table. Whenever a certain entry is used, we reset the age of the entry.

Branch History Conclusions Simple switching is too naïve. The FAB predictor is much less accurate than TAGE, even when it is useful. Simple switching is much worse than pure TAGE. “Dueling” switching work. Predictive accuracy is improved compared to TAGE with the same number of table. Taken FAB prediction accuracy is very high. However, improvements are marginal. FAB + 10 table TAGE performance is worse than 12 table TAGE performance.

Contributions We have shown the FAB concept can use used dynamically without any preprocessing. We have shown that frequency analysis can improve state-of-the- art predictor performance. Achieving meaningful improvements requires more development and may not be possible.

References M. Kampe, P. Stenstrom and M. Dubois, The FAB predictor: using Fourier analysis to predict the outcome of conditional branches, High-Performance Computer Architecture, 2002. Proceedings. Eighth International Symposium on, 2002, pp. 223-232. A. Seznec and P. Michaud, A case for (partially) TAgged GEometric history length branch prediction, Journal of Instruction Level Parallelism, 2006. G. S. Tjaden and M. J. Flynn, Detection and Parallel Execution of Independent Instructions, in IEEE Transactions on Computers, vol. C-19, no. 10, pp. 889-895, Oct. 1970. E.M. Riseman, C.C. Foster, The Inhibition of Potential Parallelism by Conditional Jumps, IEEE Transactions on Computers, vol. 21, no. 12, pp. 1405-1411, December, 1972.

Questions? Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture I University of Wisconsin–Madison {cbodden, skeehan, bzhang93}@wisc.edu