Download presentation
Presentation is loading. Please wait.
Published byNathaniel Patrick Modified over 8 years ago
1
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC
2
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 2 Branch prediction is still an issue Long pipelines Several instructions per cycle Any gain in branch prediction accuracy results in: Performance gain Power consumption gain
3
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 3 Conditional branch predictors 25 years of background work Two main sources of information: Local history the past behavior of this particular branch Global history the (recent) past behavior of all branches
4
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 4 Speculative history must be managed !? Global history: Append a bit on a single history register Use of a circular buffer and just a pointer to speculatively manage the history Local history: table of histories (unspeculatively updated) must maintain a speculative history per inflight branch: Associative search, etc ?!?
5
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 5 From my experience with EV8 team Designers hate maintaining speculative local history
6
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 6 “Classical” stuff with the OGEHL predictor Global history based: Yeh and Patt 91, Pan and So 91 Multiple tables using different history lengths McFarling 93, Evers et al. 96, EV8 predictor
7
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 7 Selecting between multiple predictions Classic solution: Use of a meta predictor “wasting” storage !?! chosing among 5 or 10 predictions ?? Neural inspired predictors: Use an adder tree instead of a meta-predictor Vintan and Iridon 99, Jiménez and Lin 01 Let’s use the adder tree
8
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 8 L(0) ∑ L(4) L(3) L(2) L(1) TO T1 T2 T3 T4 Multiple history length predictor Final computation through a sum Prediction=Sign
9
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 9 GEometric History Length predictor The set of history lengths forms a geometric series {0, 2, 4, 8, 16, 32, 64, 128} What is important: L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!
10
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 10 Dynamic update threshold fitting Reasonable fixed threshold= Number of tables On an O-GEHL predictor, best threshold depends on the application the predictor size the counter width By chance, on most applications, for the best fixed threshold, updates on mispredictions ≈ updates on correct predictions Monitor the difference and adapt the update threshold
11
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 11 Adaptative history length fitting (inspired by Juan et al 98) (½ applications: L(7) < 50) ≠ (½ applications: L(7) > 150 ) Let us adapt some history lengths to the behavior of each application 8 tables: T2: L(2) and L(8) T4: L(4) and L(9) T6: L(6) and L(10)
12
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 12 Adaptative history length fitting (2) Intuition: if high degree of conflicts on T7, stick with short history Implementation: monitoring of aliasing on updates on T7 through a tag bit and a counter Simple is sufficient: Flipping from short to long histories and vice-versa
13
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 13 ∑ L(4) L(3) or L(6) L(2) L(1) or L(5) L(0) TO T1 T2 T3 T4 Tag bits
14
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 14 Hashing 200+ bits for indexing !! Need to compute 11 bits indexes : Full hashing is unrealistic 1.Just regularly pick at most 33 bits in: address+branch history +path history 2.A single 3-entry exclusive-OR stage
15
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 15 Evaluation framework 1st Championship Branch Prediction traces: 20 traces including system activity Floating point apps : loop dominated Integer apps: usual SPECINT Multimedia apps Server workload apps: very large footprint
16
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 16 Reference configuration presented for Championship Branch Prediction 8 tables: 2 Kentries except T1, 1Kentries 5 bit counters for T0 and T1, 4 bit counters otherwise 1 Kbits of one bit tags associated with T7 10K + 5K + 6x8K + 1K = 64K L(1) =3 and L(10)= 200 {0,3,5,8,12,19,31,49,75,125,200}
17
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 17 A case for the OGEHL predictor 2nd at CBP: 2.82 misp/KI Best practice award: The predictor the closest to a possible hardware implementation Does not use exotic features: Various prime numbers, etc Strange initial state Chaining simulations: 2.84 misp/KI Very fast warming
18
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 18 A case for the OGEHL predictor (2) High accuracy 32Kbits (3,150): 3.41 misp/KI better than any implementable 128 Kbits predictor before CBP 128 Kbits 2bcgkew (6,6,24,48): 3.55 misp/KI 176 Kbits PBNP (43) : 3.67 misp/KI 1Mbits (5,300): 2.27 misp/KI 1Mbit 2bcgskew (9,9,36,72): 3.19 misp/KI 1888 Kbits PBNP (58): 3.23 misp/KI
19
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 19 A case for the OGEHL predictor (3) Robustness to variations of history lengths choices: L(1) in [2,6], L(10) in [125,300] misp. rate < 2.96 misp/KI Geometric series: not a bad formula !! best geometric L(1)=3, L(10)=223, 2.80 misp/KI best overall {0, 2, 4, 9, 12, 18, 31, 54, 114, 145, 266} 2.78 misp/KI
20
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 20 Impact of the number of components 4 components — 8 components 64 Kbits: 3.02 -- 2.84 misp/KI 256Kbits: 2.59 -- 2.44 misp/KI 1Mbit: 2.40 -- 2.27 misp/KI 6 components — 12 components 48 Kbits: 3.02 – 3.03 misp/KI 768Kbits: 2.35 – 2.25 misp/KI 4 to 12 components bring high accuracy
21
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 21 Impact of counter width Robustness to counter width variations: 3-bit counter, 49 Kbits: 3.09 misp/KI Dynamic update threshold fitting helps a lot 5-bit counter 79 Kbits: 2.79 misp/KI 4-bit is the best tradeoff
22
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 22 Prediction computation time 3 successive steps: Index computation: a 3-entry XOR gate Table read Adder tree May not fit on a single cycle: But can be ahead pipelined !
23
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 23 Ahead pipelining a global history branch predictor (principle) Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available: X-block ahead instruction address X-block ahead history To ensure accuracy: Use intermediate path information
24
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 24 Practice Ahead OGEHL: 8 // prediction computations bcd Ha A ABCD
25
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 25 Ahead Pipelined 64 Kbits OGEHL 3-block: 2.94 misp/KI 4-block: 2.99 misp/KI 5-block: 3.04 misp/KI Not such a huge accuracy loss
26
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 26 A final case for the O-GEHL predictor delivers state-of-the-art accuracy uses only global information: Very long history: 200+ bits !! can be ahead pipelined many effective design points Nb of tables, counter width, history lengths prediction computation logic complexity is low (compared with concurrent predictors )
27
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 27 Still open questions Does it exist better prediction combination functions ? Indirect jump targets ?
28
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 28 The End
29
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 29 BACK UP
30
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 30 Improved the global history Address + conditional branch history: path confusion on short histories Address + path: Direct hashing leads to path confusion 1.Represent all branches in branch history 2.Use also path history ( 1 bit per branch, limited to 16 bits)
31
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 31 Piecewise linear -- OGEHL (1) accuracy My own best implementation of piecewise linear: Shares counters for the same history rank but uses only power of two numbers of entries in tables 64 Kbits: H=31, 256 counters per rank, x 32 sums 3.72 misp/KI ------------ 2.84 misp/KI 1 MBit : H=63, 2048 counters per rank, x 32 sums 2.64 misp/KI ------------ 2.27 misp/KI 48 Mbit: H=95, 64K counters per rank, x 32 sums 2.30 misp/KI
32
Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 32 Piecewise linear -- OGEHL (2) hardware complexity Computation logic complexity: OGEHL: only a few adders PL: hundreds of adders Number of storage tables OGEHL: 4 to 12 PL: H tables ( update time) Prediction computation time: + a 3-entry XOR gate on OGEHL Information to checkpoint: Kilobits for PL
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.