Download presentation
Presentation is loading. Please wait.
Published byLesley Webb Modified over 9 years ago
1
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA
2
2 Build on ITTAGE ITTAGE: Introduced at the same time as TAGE (JILP 2006) Derived directly from the TAGE predictor: Target prediction instead of direction prediction
3
3 ITTAGE: multiple tables, global history predictor The set of history lengths forms a geometric series What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !! {0, 2, 4, 8, 16, 32, 64, 128} Capture correlation on very long histories
4
4 pc h[0:L1] =? prediction pc h[0:L2 ] pc h[0:L3 ] 32 1 1 1 Tagless base Predictor The ITTAGE predictor
5
5 Prediction computation General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr Sometimes Altpred (slightly) more accurate than Pred Property dynamically monitored through a single 4-bit counter -2 % MPPKI
6
6 A tagged table entry Ctr: 2-bit hysteresis counter U: 1-bit useful counter Was the entry recently useful ? Tag: partial tag Target: the target TargetTagCtrU 32 bits or some way to reconstruct it
7
7 Allocate entries on mispredictions Allocate entries in longer history length tables On tables with U unset Set Ctr to Weak and U to 0 HUGE STORAGE BUDGET: Up to 3 entries allocated in different tables Fast warming
8
8 Managing the (U)seful bit Setting when avoids a misprediction (Pred = target) & (Alt ≠ target) Global reset when « difficulties » to allocate Dynamically monitor if more failures than successes on allocations
9
9 Most of the storage space for targets 32 bits per entry !! More than 12K (PC,target) pairs on CLIENT05 But only a maximum of 4038 different targets Use 12 bit pointers + a 4K table
10
10 Let us be realistic: leverage target locality All targets in at most 90 256KB regions Use a 128-entry region table: Fully associative, 240 bytes Saves 7 bits per ITTAGE entry Would have saved 39 bits on a 64-bit architecture !!
11
11 TargetTagCtrU Region offsetRegion pointer
12
12 The global history -16 % MPPKI
13
13 The global history (2) Including all branches ? Only indirect and calls: -2.5 % MPPKI But no conclusion: without 2 branches on INT05 and INT06 just the other way
14
14 + the other tricks (for TAGE) Immediate Update Mimicker Storage space interleaving Picking the best set of history lengths -1 % MPPKI
15
15 The Immediate Update Mimicker Issue: Some mispredictions due to late updates at retirement Immediate Update Mimicker: Try to catch these cases
16
16 PTAPTA Same table, same entry ETAETA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA ETAETA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA Misprediction P(rediction) T(able) A(ddress in the table) PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA Fetch The Immediate Update Mimicker
17
17 =? prediction Xbar h[0,L 1] For the competition: interleaving
18
18 For the competition Guided selection of the best set of history lengths: 4Kentries: 0, 4Kentries: 0, 10, 4Kentries: 16, 27, 44, 60, 96, 109, 219, 449, 2Kentries: 487, 714, 1313, 2146, 3881 Remember: 10 bits per indirect, 5 per call
19
19 Where is the limit ? Less than 3 % MPPKI Why did you not use the « 12-bit pointer » trick ? Just winning 0.5 % MPPKI
20
20 Summary ITTAGE directly derived from TAGE History should include (PC+target) for indirect and calls Locality on targets can be leveraged Marginal tricks not really worth
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.