Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin,

Similar presentations


Presentation on theme: "Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin,"— Presentation transcript:

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin, Chris Wilkerson, and Jared Stark Presenter: Xiaoxiao Wang

2 Agenda Motivation and Related Work Identifying Affector Branches at Run-time Building Predictors Using Affector Information Experimental Results Conclusion

3 Motivation and Related Work Processor pipelines have been growing deeper. Branch misprediction penalty will become very high[18]. Small predictors’ accuracy can be greatly improved by size increase, but not for large predictors [1][5][12][13][16][19]. Larger predictors increase prediction delay [2][8][16]. Future transistor budgets permit larger area for branch predictors [4][16].

4 How to Improve Prediction Rate? Not all branches in the long history may be correlated to the branch under prediction [11][20][21]--- more selective use. Two primary reasons for related branches [6]: 1) proceeding branch’s outcome affects computation that determines the outcome of the succeeding branch (affector). 2) computations affecting their outcomes are (fully of partially) based on the same (or related) information (forerunner). Identify correlated branches from a large global history.

5 Identifying Affector Branches at Run-time B0 B2 B5 B8 BB0 BB2 BB3 BB5BB6 BB7 BB8 R1=R2 R1=R2+4 R2=R1+R2 R3=R4+4 If R2==R3 TN N N T N NT B8 is to be determined Latest 5 branches {B0 B2 B3 B5 B7} (TNTTN) Affector blocks for B8: {BB2 BB3 BB7} => Affector branch for B8: {B0, B2, B5} Affector Branch Bitmap for B8 is 11010 Tracking the runtime dataflow and determine the affector branches for the last updates of each Architecture Register. B3 B7

6 Affector Register File (ARF) Structure Keep a separate record of affector information corresponding to each architecture register a entry in ARF. 0 1 2 31 30 1 0 0 0 0 0 1 1 0 1 0 Affectors

7 Affector Branch Bitmap (ABB) Generation Algorithm Principle 1: When the processor encounters a conditional branch, all entries in the ARF are shifted left by 1 bit and fill 0. Principle 2: When the processor encounters a register- writing instruction, the ARF entries corresponding to the source registers are read, OR’ed together and written to the ARF entry corresponding to the destination register with a 1 in LSB. Principle 3: When the processor encounters a conditional branch instruction, the ARF entries corresponding to its source registers are read and OR’ed generating ABB.

8 Affector Branch Bitmap (ABB) Generation Algorithm B0 B2 B5 B8 BB0 BB2 BB3 BB5BB6 BB7 BB8 I0: R1=R2 If R2==R3 T N N N T N NT I2: R1=R2+4 B3 B7 I3:R2=R1+R2 I7: R3=R4+4 X X X X 0 X X X X 1 X X X X 0 R0 R1 R2 R3 ARF after I2 1 1 0 1 0 ABB ARF after B2 X X X 0 0 X X X 1 0 X X X 0 0 R0 R1 R2 R3 Princ1: ARF after I3 X X X 0 0 X X X 1 0 X X X 1 1 X X X 0 0 R0 R1 R2 R3 Princ2: 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 R0 R1 R2 R3 ARF after B7 Princ3:

9 Misprediction Recovery Principle 4: When a branch misprediction is detected, speculative updates to ARF after the mispredicted branch should be shifted out. 1 0 0 0 0 0 1 1 0 1 0 Mispredicted branch X X X X 1 0 0 0 0 0 1 Shift right by 4-bit Princ4:

10 Building Predictors Using Affector Information: Zeroing Scheme 1 0 1 1 0 1 1 1 0 1 1 … 1 0 1 Global History Affector Bitmap 0 0 0 0 1 1 0 1 1 0 0 … 0 0 1 Mask (AND) Fold XOR Predictor Look Up Index Turning off Non-affector Bits and Hashing: Zeroing Scheme All non-affector bits in the long global history are masked to become zeros by ANDing the branch’s ABB and the long global history. Result is hashed down to the required number of bits using a fold and XOR hash technique. The identified affectors are retained in their respective positions. 0 0 0 0 0 1 0 1 0 0 0 … 0 0 1

11 Building Predictors Using Affector Information: Packing Scheme 1 0 1 1 0 1 1 1 0 1 1 … 1 0 1 Global History Affector Bitmap 0 0 0 0 1 1 0 1 1 0 0 … 0 0 1 Mask (AND) Pack Predictor Look Up Index 0 0 0 0 0 1 0 1 0 0 0 … 0 0 1 Turning off Non-affector Bits Packing and Hashing: Packing Scheme Remove the non- affectors altogether. Result is hashed down to the required number of bits using a fold and XOR hash technique. The identified affectors are not retained in their respective positions. 0 1 1 0 … 1 Fold XOR

12 Proposed Predictor Organization Read ARF Line Predictor Hash Instruction Global History Primary Global Predictor Read Corrector Predictor (Rare Event Prediction) Compare Tag Hit Corrector Prediction One Cycle Prediction Primary Prediction (Perceptron or YAGS) Stage1Stage2Stage3Stage4

13 Experiment Setup SimpleScalar v3.0 using Alpha ISA 12 benchmarks from SPEC95 and SPEC2000 integer benchmark suites.

14 Experimental Evaluation Figure 1. Misprediction Results for Zeroing and Packing techniques for our Corrector Predictor along with (i) Perceptron Primary Predictor; (ii) YAGS Primary Predictor

15 Experimental Evaluation Figure 2. (i) Performance of a Modeled Superscalar for Various Branch Corrector Predictor Schemes. (ii) Per-benchmark Misprediction Rates for the Corresponding Corrector Predictors.

16 Conclusion The hard-to-predict branches of a primary global predictor is predicted by a very accurate corrector predictor with one or two cycles additional latency. A technique by which a long global history can be used for this corrector predictor by identifying correlated branches in history using run-time dataflow information is proposed. Two prediction schemes Zeroing and Packing are proposed. Adding a 8KB affector history based corrector predictor to a 16KB perceptron primary predictor decreases the average misprediction rate for 12 benchmarks from 6.3% to 5.7%.


Download ppt "Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin,"

Similar presentations


Ads by Google