Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor

Similar presentations


Presentation on theme: "Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor"β€” Presentation transcript:

1 Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor
Sizhuo Zhang 6.175 TA October 30, 2015

2 1-bit Distributed Epochs
Delay: 0 cycle feEp=0 eEp=0 Decode redirects I1 (ieEp=idEp=0) fdEp=0 Delay: 100 cycles dEp=0 ieEp=0 I1 Fetch PC Decode f2d d2e Execute ... October 30, 2015

3 1-bit Distributed Epochs
Delay: 0 cycle feEp=0 eEp=0 Decode redirects I1 (ieEp=idEp=0) fdEp=0 Delay: 100 cycles dEp=1 I1 redirect, ieEp=0 deEp=0 I1 Fetch PC Decode f2d d2e Execute ... October 30, 2015

4 1-bit Distributed Epochs
Delay: 0 cycle feEp=0 eEp=0 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 fdEp=0 dEp=1 Delay: 100 cycles I1 redirect, ieEp=0 deEp=0 I1 Fetch PC Decode f2d d2e Execute ... October 30, 2015

5 1-bit Distributed Epochs
Delay: 0 cycle feEp=1 eEp=1 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 fdEp=0 dEp=1 Delay: 100 cycles I1 redirect, ieEp=0 deEp=0 I1 Fetch PC Decode f2d d2e Execute ... October 30, 2015

6 1-bit Distributed Epochs
Delay: 0 cycle feEp=1 eEp=1 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues fdEp=0 dEp=1 Delay: 100 cycles I1 redirect, ieEp=0 deEp=0 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015

7 1-bit Distributed Epochs
Delay: 0 cycle feEp=1 eEp=1 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues fdEp=0 dEp=0 Delay: 100 cycles I1 redirect, ieEp=0 deEp=1 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015

8 1-bit Distributed Epochs
Delay: 0 cycle feEp=1 eEp=1 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues Execute redirects I2 fdEp=0 dEp=0 Delay: 100 cycles I1 redirect, ieEp=0 deEp=1 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015

9 1-bit Distributed Epochs
Delay: 0 cycle Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues Execute redirects I2 feEp=0 eEp=0 fdEp=0 dEp=0 Delay: 100 cycles I1 redirect, ieEp=0 deEp=1 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015

10 1-bit Distributed Epochs
Delay: 0 cycle Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues Execute redirects I2 I1 redirect arrives at Fetch (ieEp == feEp) change PC to a wrong value feEp=0 eEp=0 fdEp=0 dEp=0 Delay: 100 cycles I1 redirect, ieEp=0 deEp=1 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015

11 Unbounded Global Epochs
redirect PC eEp Both Decode and Execute can redirect the PC Execute redirect should never be overruled Global epoch for each redirecting stage eEpoch: incremented when redirect from Execute takes effect dEpoch: incremented when redirect from Decode takes effect Initially set all epochs to 0 Redirect dEp redirect PC miss pred? miss pred? Execute d2e Fetch PC Decode f2d ... October 30, 2015

12 Execute stage ... Is ieEp == eEp? yes no Current instruction is OK;
{pc, newpc} eEp Redirect dEp {pc, newpc} miss pred? miss pred? {…, ieEp, idEp} {pc, ppc, ieEp} Execute d2e Fetch PC Decode f2d ... Is ieEp == eEp? yes no Current instruction is OK; check the ppc prediction by execution, signal a redirect message (by writing EHR) on misprediction Wrong path instruction; poison it October 30, 2015

13 Decode Stage ... Is ieEp == eEp && idEp == dEp ? yes no
{pc, newpc} eEp Redirect dEp {pc, newpc} miss pred? miss pred? {pc, ppc, ieEp, idEp} {..., ieEp} Execute d2e Fetch PC Decode f2d ... Is ieEp == eEp && idEp == dEp ? yes no Current instruction is OK; check the ppc prediction via BHT, signal a redirect message (by writing EHR) on misprediction Wrong path instruction; drop it October 30, 2015

14 Fetch stage: Redirect PC, Change Epoch
dEp {pc, newpc} miss pred? miss pred? Execute d2e Fetch PC Decode f2d ... If there is redirect message from Execute Set PC to correct value, increment eEp Else if there is redirect message from Decode Set PC to correct value, increment dEp We can always request IMem to fetch new instruction New instruction will be tagged with current eEpoch and dEpoch PC redirection overrides next-PC prediction When PC redirects, new instruction must be killed at Decode stage October 30, 2015

15 Observation Proved by induction on 𝑑 ... ... ... 𝐼 𝑑 (1) 𝐼 𝑑 ( π‘š 𝑑 )
Fetch Decode Execute eEpoch ... ... ... dEpoch 𝐼 𝑑 (1) 𝐼 𝑑 ( π‘š 𝑑 ) 𝐼 𝑑 ( 𝑛 𝑑 ) At cycle 𝑑, 𝑛 𝑑 instructions between Fetch and Execute: 𝐼 𝑑 1… 𝑛 𝑑 𝐼 𝑑 (1) at Fetch, 𝐼 𝑑 𝑛 𝑑 at Execute, 𝐼 𝑑 π‘š 𝑑 at Decode Invariant: 𝑒𝐸𝑝= 𝐼 𝑑 1 .𝑖𝑒𝐸𝑝β‰₯ 𝐼 𝑑 2 .𝑖𝑒𝐸𝑝β‰₯…β‰₯ 𝐼 𝑑 𝑛 𝑑 .𝑖𝑒𝐸𝑝β‰₯π‘’πΈπ‘βˆ’1 𝑑𝐸𝑝= 𝐼 𝑑 1 .𝑖𝑑𝐸𝑝β‰₯ 𝐼 𝑑 2 .𝑖𝑑𝐸𝑝β‰₯…β‰₯ 𝐼 𝑑 π‘š 𝑑 .𝑖𝑑𝐸𝑝β‰₯π‘‘πΈπ‘βˆ’1 Proved by induction on 𝑑 At cycle 𝑑, both Decode and Execute redirect 𝐼 𝑑 𝑛 𝑑 .𝑖𝑒𝐸𝑝=π‘’πΈπ‘βŸΉπ‘’πΈπ‘= 𝐼 𝑑 1 .𝑖𝑒𝐸𝑝= 𝐼 𝑑 2 .𝑖𝑒𝐸𝑝=…= 𝐼 𝑑 𝑛 𝑑 .𝑖𝑒𝐸𝑝 𝐼 𝑑 π‘š 𝑑 .𝑖𝑑𝐸𝑝=π‘‘πΈπ‘βŸΉπ‘‘πΈπ‘= 𝐼 𝑑 1 .𝑖𝑑𝐸𝑝= 𝐼 𝑑 2 .𝑖𝑑𝐸𝑝=…= 𝐼 𝑑 π‘š 𝑑 .𝑖𝑑𝐸𝑝 𝑒𝐸𝑝←𝑒𝐸𝑝+1, invariants still hold at cycle 𝑑+1 October 30, 2015

16 1-bit Global Epochs The max difference of values of one type of epoch is 1 Only need 1 bit to encode each epoch Increment epoch οƒ  flip epoch October 30, 2015

17 4-stage Pipeline: Code Sketch
module mkProc(Proc); // stage 1: Fetch // stage 2: Decode & read register // stage 3: Execute & access memory // stage 4: Commit (write register file) Ehr#(2, Addr) pc <- mkEhr(?); IMemory iMem <- mkIMemory; ... Fifo#(2, Fetch2Decode) f2d <- mkCFFifo; Fifo#(2, Decode2Execute) d2e <- mkCFFifo; Fifo#(2, Execute2Commit) e2c <- mkCFFifo; Reg#(Bool) eEpoch <- mkReg(False); Reg#(Bool) dEpoch <- mkReg(False); Ehr#(2, Maybe#(ExeRedirect)) exeRedirect <- mkEhr(Invalid); Ehr#(2, Maybe#(DecRedirect)) decRedirect <- mkEhr(Invalid); BTB#(16) btb <- mkBTB; BHT#(1024) bht <- mkBHT; October 30, 2015

18 Execute Stage BHT training will be in lab
rule doExecute; d2e.deq; let x = d2e.first; if(x.ieEp == eEpoch) begin let eInst = exec(...); if(eInst.mispredict) exeRedirect[0] <= Valid ({pc: x.pc, newpc: eInst.addr}); ... end else e2c.enq(Exec2Commit{poisoned: True, ...}); // wrong path instruction, poison it endrule BHT training will be in lab October 30, 2015

19 Decode Stage rule doDecode; f2d.deq; let x = f2d.first; if(x.ieEp == eEpoch && x.idEp == dEpoch) begin let stall = ...; // check scoreboard for stall if(!stall) begin if(x.iType == Br) begin let bht_ppc = ...; // BHT prediction if(bht_ppc != x.ppc) decRedirect[0] <= Valid ({pc: x.pc, newpc: bht_ppc}); end ... d2e.enq(Decode2Execute{ieEp: x.ieEp, ...}); // else: wrong path instruction, killed endrule October 30, 2015

20 Fetch Stage rule doFetch; let inst = iMem.req(pc[0]); let ppc = btb.predPc(pc[0]); pc[0] <= ppc; f2d.enq(Fetch2Decode{pc: pc[0], ppc: ppc, inst: inst, ieEp: eEpoch, idEp: dEpoch}); endrule rule cononicalizeRedirect; if(exeRedirect[1] matches tagged Valid .r) begin pc[1] <= r.newpc; btb.update(r.pc, r.newpc); eEpoch <= !eEpoch; end else if(decRedirect[1] matches tagged Valid .r) begin dEpoch <= !dEpoch; end exeRedirect[1] <= Invalid; decRedirect[1] <= Invalid; October 30, 2015

21 Advanced Branch Predictor
BHT cannot predict very accurately Need more sophisticated schemes Global branch history taken/not token for previous branches Local branch history taken/not token for previous occurrences of the same branch Tournament branch predictor Use both global and local history Alpha 21264 October 30, 2015

22 Tournament Predictor 10-bit PC: index 1024 x 10-bit local history table 10-bit local history Index 1024 x 3-bit BHT: prediction 1 12-bit global history Index 4096 x 2-bit BHT: prediction 2 Index 4096 x 2-bit BHT: select between predictions 1, 2 β€œThe Alpha Microprocessor Architecture” October 30, 2015

23 TAGE Branch Predictors
TAGE (Tagged Geometric history length) β€œA case for partially tagged geometric history length branch prediction” October 30, 2015

24 Other Predictors Perceptron-based predictors
Classification problem in machine learning Championship Branch Predictor Papers Slides simulation codes October 30, 2015

25 Return Address Stack Use a stack to store return address from function call Push when function is called Pop when returning from a function Instruction to return from function call JALR: rd = x0, rs1 = x1 (ra) Instruction to initiate function call JAL: rd = x1 JALR: rd = x1 October 30, 2015


Download ppt "Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor"

Similar presentations


Ads by Google