Download presentation
Presentation is loading. Please wait.
Published byGro Christensen Modified over 6 years ago
1
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture I University of Wisconsin–Madison {cbodden, skeehan,
2
Overview and Motivation
Achieving high ILP is difficult for real programs because of conditional execution. (Flynn’s bottleneck, Tjaden 1970) High ILP can be achieved if conditional execution can be removed or predicted accurately. (Riseman 1972) While current state-of-the-art approaches are extremely accurate (~99%), room for improvement remains. In this project we consider whether frequency analysis can improve state-of-the-art prediction performance.
3
Previous Work TAGE (Seznec 2006) Idea:
Use increasingly longer tags (PC xor Path Hist) to predict branch outcomes. Includes a base predictor as a fallback. Problem: The higher history length tables are used infrequently and have high overhead. This overhead could potentially be used more efficiently! FAB (Kampe 2002) Idea: Use frequency analysis to predict branches that a main branch predictor has troubles with (typically outcome branches). Problem: FAB uses a program profiler to completely analyze an entire trace before execution. This is impractical!
4
Plan A: Try to predict TAGE misses
5
Predict TAGE Misses Why?
The overhead for performing frequency domain analysis is high. TAGE is very accurate. We only want to consider when it misses often. So we can consider TAGE as a filter for the information we want. How? Store TAGE misses in a TAGE miss history buffer. Analyze miss history using Fourier analysis. Predict ~TAGE when a misprediction is expected. Implement this approach either globally or per a branch.
6
Global TAGE Miss History Design
Maintain a large global TAGE miss history buffer of M bits. When the buffer fills, perform a discrete Fourier transform and take the K frequencies with the largest amplitude. If TAGE predicted incorrectly and an override outcome is correct, increment a usefulness counter, u. If TAGE predicted incorrectly and an override outcome is incorrect decrement the counter. (Only count on valuable cases.) When the usefulness exceeds a threshold and a TAGE miss is predicted, negate the TAGE prediction. When the usefulness drops below -threshold perform a discrete Fourier transform and update the sine table.
7
Global TAGE Miss History Results
FA-TAGE using 10 TAGE tables and global TAGE miss history has almost identical performance to a 10 table TAGE predictor. Almost no overrides. Override accuracy is under 50%. Global TAGE misses don’t seem to have any frequency component. We are predicting noise. Maybe per a branch TAGE misses will have more information?
8
Branch TAGE Miss History Design
Use the same basic design as global TAGE miss history, but track TAGE misses and sine tables per each branch. Index the TAGE miss history table using the branch PC. Use shorter miss histories. An age counter can be used for replacement.
9
Branch TAGE Miss History Results
FA-TAGE using 10 TAGE tables and per branch TAGE miss history has worse or almost identical performance to a 10 table TAGE predictor. More overrides at lower history lengths, but still not many. Override accuracy is still under 50% and inversely related to number of overrides. Per branch TAGE misses don’t seem to have any frequency component either. Once again we are predicting noise.
10
TAGE Miss History Conclusions
TAGE miss history, whether at the global or branch level, is not meaningful in the frequency domain. TAGE misses are non-deterministic and the miss history appears as noise. Very few overrides of TAGE were actually used. TAGE overrides were not accurate when used. … on to Plan B.
11
Plan B: Try to predict branch histories and switch
12
Predict Branch Outcomes
Why? TAGE miss history is not effective for frequency domain analysis. Branch outcome histories may contain frequency information. How? Store branch outcomes in a history buffer. Analyze branch history using Fourier analysis. Predict TAGE or FAB based on a switching mechanism. Implement this approach on a per a branch.
13
Branch History Design (simple switching)
Use the same design as the per branch TAGE miss history override predictor. Replace TAGE miss history with branch outcome history. Once the usefulness counter reaches a high threshold (the FAB predictor is predicting the branch accurately) use the FAB prediction for this branch. Continue to monitor the usefulness and update the branch’s sine table if it loses usefulness.
14
Branch History (simple switching) Results
Using branch history with a simple threshold switch finally achieves meaning levels of overriding TAGE. Override accuracy is over 90%. The branch histories do have meaningful frequency components. However, 90% accuracy is still worse than TAGE. Many overrides hurts the MPKI. Over 2x worse MPKI than a 10 table TAGE predictor. Worse than a 32 KB Gshare predictor. Poor performance, but establishes that branch history has frequency information and now our issue is the switching mechanism.
15
Branch History Design (“dueling” switching)
Replace usefulness with two values: TAGE usefulness and FAB usefulness. These are saturating counter from 0 to N. Add two length N circular buffers, one for TAGE and one for FAB. After each branch resolves, update the TAGE/FAB usefulness counters by subtracting the oldest TAGE/FAB buffer entry and removing this entry from the buffers. Add the most recent TAGE/FAB outcome into the buffers and add the outcome to the TAGE/FAB usefulness counters. Use the largest TAGE/FAB usefulness to choose the predictor to use for this branch and predict. If FAB usefulness falls below a threshold, update sine table.
16
Branch History (“dueling” switching) Results
Using branch history with dueling switching achieves appropriate levels of overriding TAGE to notice improvements. Override accuracy is over 99%. Finally we have achieved an accuracy high enough almost always override correctly. This override percentage finally reduces the average MPKI, however the reduction is insignificant. (.3%) from the 10 table TAGE predictor. Interestingly, frequency information is captured better by using smaller branch history lengths. We expect that the frequency information changes often enough that this is better captured by a shorter history.
17
More Branch History (“dueling” switching) Results
Analysis of space used: ~5000 entries x (64 bits [branch history]+ 256 bits [TAGE hit history] bits [FAB hit history]) = bits = 360 kB. This is clearly impractical and a smart implementation would have a replacement policy to drop underused FAB entries. In particular, when the age of a certain entry goes above a threshold value, we could drop it from the table. Whenever a certain entry is used, we reset the age of the entry.
18
Branch History Conclusions
Simple switching is too naïve. The FAB predictor is much less accurate than TAGE, even when it is useful. Simple switching is much worse than pure TAGE. “Dueling” switching work. Predictive accuracy is improved compared to TAGE with the same number of table. Taken FAB prediction accuracy is very high. However, improvements are marginal. FAB + 10 table TAGE performance is worse than 12 table TAGE performance.
19
Contributions We have shown the FAB concept can use used dynamically without any preprocessing. We have shown that frequency analysis can improve state-of-the- art predictor performance. Achieving meaningful improvements requires more development and may not be possible.
20
References M. Kampe, P. Stenstrom and M. Dubois, The FAB predictor: using Fourier analysis to predict the outcome of conditional branches, High-Performance Computer Architecture, Proceedings. Eighth International Symposium on, 2002, pp A. Seznec and P. Michaud, A case for (partially) TAgged GEometric history length branch prediction, Journal of Instruction Level Parallelism, 2006. G. S. Tjaden and M. J. Flynn, Detection and Parallel Execution of Independent Instructions, in IEEE Transactions on Computers, vol. C-19, no. 10, pp , Oct E.M. Riseman, C.C. Foster, The Inhibition of Potential Parallelism by Conditional Jumps, IEEE Transactions on Computers, vol. 21, no. 12, pp , December, 1972.
21
Questions? Boyu Zhang, Christopher Bodden, Dillon Skeehan
ECE/CS 752 Advanced Computer Architecture I University of Wisconsin–Madison {cbodden, skeehan,
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.