Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Advertisements

Dynamic Branch Prediction
Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP III Steve Ko Computer Sciences and Engineering University at Buffalo.
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
Dynamic Branch Prediction
Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
Interrupts / Exceptions / Faults Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 30, 2012L21-1
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
Constructive Computer Architecture Virtual Memory and Interrupts Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
6.375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016http://csg.csail.mit.edu/6.375T04-1.
Computer Architecture: A Constructive Approach Data Hazards and Multistage Pipelines Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*,
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Prof. Hsien-Hsin Sean Lee
CS203 – Advanced Computer Architecture
6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science…
Computer Structure Advanced Branch Prediction
Control Hazards Constructive Computer Architecture: Arvind
Computer Architecture Advanced Branch Prediction
Tutorial 7: SMIPS Epochs Constructive Computer Architecture
CS252 Graduate Computer Architecture Spring 2014 Lecture 8: Advanced Out-of-Order Superscalar Designs Part-II Krste Asanovic
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Branch Prediction Constructive Computer Architecture: Arvind
Branch Prediction Constructive Computer Architecture: Arvind
Multistage Pipelined Processors and modular refinement
in Pipelined Processors
CMSC 611: Advanced Computer Architecture
Non-Pipelined Processors - 2
Lab 4 Overview: 6-stage SMIPS Pipeline
Non-Pipelined and Pipelined Processors
in Pipelined Processors
Control Hazards Constructive Computer Architecture: Arvind
Lecture: Out-of-order Processors
Branch Prediction Constructive Computer Architecture: Arvind
Bypassing Computer Architecture: A Constructive Approach Joel Emer
Multistage Pipelined Processors and modular refinement
Modular Refinement Arvind
Realistic Memories and Caches
Branch Prediction: Direction Predictors
Branch Prediction: Direction Predictors
Lecture 10: Branch Prediction and Instruction Delivery
Multistage Pipelined Processors and modular refinement
in Pipelined Processors
Pipelined Processors Arvind
Control Hazards Constructive Computer Architecture: Arvind
Pipelined Processors Constructive Computer Architecture: Arvind
Branch Prediction: Direction Predictors
Tutorial 4: RISCV modules Constructive Computer Architecture
Modular Refinement Arvind
Control Hazards Constructive Computer Architecture: Arvind
Modular Refinement Arvind
Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture
Pipelined Processors: Control Hazards
Presentation transcript:

Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor Sizhuo Zhang 6.175 TA October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Distributed Epochs Delay: 0 cycle feEp=0 eEp=0 Decode redirects I1 (ieEp=idEp=0) fdEp=0 Delay: 100 cycles dEp=0 ieEp=0 I1 Fetch PC Decode f2d d2e Execute ... October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Distributed Epochs Delay: 0 cycle feEp=0 eEp=0 Decode redirects I1 (ieEp=idEp=0) fdEp=0 Delay: 100 cycles dEp=1 I1 redirect, ieEp=0 deEp=0 I1 Fetch PC Decode f2d d2e Execute ... October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Distributed Epochs Delay: 0 cycle feEp=0 eEp=0 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 fdEp=0 dEp=1 Delay: 100 cycles I1 redirect, ieEp=0 deEp=0 I1 Fetch PC Decode f2d d2e Execute ... October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Distributed Epochs Delay: 0 cycle feEp=1 eEp=1 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 fdEp=0 dEp=1 Delay: 100 cycles I1 redirect, ieEp=0 deEp=0 I1 Fetch PC Decode f2d d2e Execute ... October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Distributed Epochs Delay: 0 cycle feEp=1 eEp=1 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues fdEp=0 dEp=1 Delay: 100 cycles I1 redirect, ieEp=0 deEp=0 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Distributed Epochs Delay: 0 cycle feEp=1 eEp=1 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues fdEp=0 dEp=0 Delay: 100 cycles I1 redirect, ieEp=0 deEp=1 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Distributed Epochs Delay: 0 cycle feEp=1 eEp=1 Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues Execute redirects I2 fdEp=0 dEp=0 Delay: 100 cycles I1 redirect, ieEp=0 deEp=1 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Distributed Epochs Delay: 0 cycle Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues Execute redirects I2 feEp=0 eEp=0 fdEp=0 dEp=0 Delay: 100 cycles I1 redirect, ieEp=0 deEp=1 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Distributed Epochs Delay: 0 cycle Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues Execute redirects I2 I1 redirect arrives at Fetch (ieEp == feEp) change PC to a wrong value feEp=0 eEp=0 fdEp=0 dEp=0 Delay: 100 cycles I1 redirect, ieEp=0 deEp=1 I2 Fetch PC Decode f2d d2e Execute ... October 30, 2015 http://csg.csail.mit.edu/6.175

Unbounded Global Epochs redirect PC eEp Both Decode and Execute can redirect the PC Execute redirect should never be overruled Global epoch for each redirecting stage eEpoch: incremented when redirect from Execute takes effect dEpoch: incremented when redirect from Decode takes effect Initially set all epochs to 0 Redirect dEp redirect PC miss pred? miss pred? Execute d2e Fetch PC Decode f2d ... October 30, 2015 http://csg.csail.mit.edu/6.175

Execute stage ... Is ieEp == eEp? yes no Current instruction is OK; {pc, newpc} eEp Redirect dEp {pc, newpc} miss pred? miss pred? {…, ieEp, idEp} {pc, ppc, ieEp} Execute d2e Fetch PC Decode f2d ... Is ieEp == eEp? yes no Current instruction is OK; check the ppc prediction by execution, signal a redirect message (by writing EHR) on misprediction Wrong path instruction; poison it October 30, 2015 http://csg.csail.mit.edu/6.175

Decode Stage ... Is ieEp == eEp && idEp == dEp ? yes no {pc, newpc} eEp Redirect dEp {pc, newpc} miss pred? miss pred? {pc, ppc, ieEp, idEp} {..., ieEp} Execute d2e Fetch PC Decode f2d ... Is ieEp == eEp && idEp == dEp ? yes no Current instruction is OK; check the ppc prediction via BHT, signal a redirect message (by writing EHR) on misprediction Wrong path instruction; drop it October 30, 2015 http://csg.csail.mit.edu/6.175

Fetch stage: Redirect PC, Change Epoch dEp {pc, newpc} miss pred? miss pred? Execute d2e Fetch PC Decode f2d ... If there is redirect message from Execute Set PC to correct value, increment eEp Else if there is redirect message from Decode Set PC to correct value, increment dEp We can always request IMem to fetch new instruction New instruction will be tagged with current eEpoch and dEpoch PC redirection overrides next-PC prediction When PC redirects, new instruction must be killed at Decode stage October 30, 2015 http://csg.csail.mit.edu/6.175

Observation Proved by induction on 𝑡 ... ... ... 𝐼 𝑡 (1) 𝐼 𝑡 ( 𝑚 𝑡 ) Fetch Decode Execute eEpoch ... ... ... dEpoch 𝐼 𝑡 (1) 𝐼 𝑡 ( 𝑚 𝑡 ) 𝐼 𝑡 ( 𝑛 𝑡 ) At cycle 𝑡, 𝑛 𝑡 instructions between Fetch and Execute: 𝐼 𝑡 1… 𝑛 𝑡 𝐼 𝑡 (1) at Fetch, 𝐼 𝑡 𝑛 𝑡 at Execute, 𝐼 𝑡 𝑚 𝑡 at Decode Invariant: 𝑒𝐸𝑝= 𝐼 𝑡 1 .𝑖𝑒𝐸𝑝≥ 𝐼 𝑡 2 .𝑖𝑒𝐸𝑝≥…≥ 𝐼 𝑡 𝑛 𝑡 .𝑖𝑒𝐸𝑝≥𝑒𝐸𝑝−1 𝑑𝐸𝑝= 𝐼 𝑡 1 .𝑖𝑑𝐸𝑝≥ 𝐼 𝑡 2 .𝑖𝑑𝐸𝑝≥…≥ 𝐼 𝑡 𝑚 𝑡 .𝑖𝑑𝐸𝑝≥𝑑𝐸𝑝−1 Proved by induction on 𝑡 At cycle 𝑡, both Decode and Execute redirect 𝐼 𝑡 𝑛 𝑡 .𝑖𝑒𝐸𝑝=𝑒𝐸𝑝⟹𝑒𝐸𝑝= 𝐼 𝑡 1 .𝑖𝑒𝐸𝑝= 𝐼 𝑡 2 .𝑖𝑒𝐸𝑝=…= 𝐼 𝑡 𝑛 𝑡 .𝑖𝑒𝐸𝑝 𝐼 𝑡 𝑚 𝑡 .𝑖𝑑𝐸𝑝=𝑑𝐸𝑝⟹𝑑𝐸𝑝= 𝐼 𝑡 1 .𝑖𝑑𝐸𝑝= 𝐼 𝑡 2 .𝑖𝑑𝐸𝑝=…= 𝐼 𝑡 𝑚 𝑡 .𝑖𝑑𝐸𝑝 𝑒𝐸𝑝←𝑒𝐸𝑝+1, invariants still hold at cycle 𝑡+1 October 30, 2015 http://csg.csail.mit.edu/6.175

1-bit Global Epochs The max difference of values of one type of epoch is 1 Only need 1 bit to encode each epoch Increment epoch  flip epoch October 30, 2015 http://csg.csail.mit.edu/6.175

4-stage Pipeline: Code Sketch module mkProc(Proc); // stage 1: Fetch // stage 2: Decode & read register // stage 3: Execute & access memory // stage 4: Commit (write register file) Ehr#(2, Addr) pc <- mkEhr(?); IMemory iMem <- mkIMemory; ... Fifo#(2, Fetch2Decode) f2d <- mkCFFifo; Fifo#(2, Decode2Execute) d2e <- mkCFFifo; Fifo#(2, Execute2Commit) e2c <- mkCFFifo; Reg#(Bool) eEpoch <- mkReg(False); Reg#(Bool) dEpoch <- mkReg(False); Ehr#(2, Maybe#(ExeRedirect)) exeRedirect <- mkEhr(Invalid); Ehr#(2, Maybe#(DecRedirect)) decRedirect <- mkEhr(Invalid); BTB#(16) btb <- mkBTB; BHT#(1024) bht <- mkBHT; October 30, 2015 http://csg.csail.mit.edu/6.175

Execute Stage BHT training will be in lab rule doExecute; d2e.deq; let x = d2e.first; if(x.ieEp == eEpoch) begin let eInst = exec(...); if(eInst.mispredict) exeRedirect[0] <= Valid ({pc: x.pc, newpc: eInst.addr}); ... end else e2c.enq(Exec2Commit{poisoned: True, ...}); // wrong path instruction, poison it endrule BHT training will be in lab October 30, 2015 http://csg.csail.mit.edu/6.175

Decode Stage rule doDecode; f2d.deq; let x = f2d.first; if(x.ieEp == eEpoch && x.idEp == dEpoch) begin let stall = ...; // check scoreboard for stall if(!stall) begin if(x.iType == Br) begin let bht_ppc = ...; // BHT prediction if(bht_ppc != x.ppc) decRedirect[0] <= Valid ({pc: x.pc, newpc: bht_ppc}); end ... d2e.enq(Decode2Execute{ieEp: x.ieEp, ...}); // else: wrong path instruction, killed endrule October 30, 2015 http://csg.csail.mit.edu/6.175

Fetch Stage rule doFetch; let inst = iMem.req(pc[0]); let ppc = btb.predPc(pc[0]); pc[0] <= ppc; f2d.enq(Fetch2Decode{pc: pc[0], ppc: ppc, inst: inst, ieEp: eEpoch, idEp: dEpoch}); endrule rule cononicalizeRedirect; if(exeRedirect[1] matches tagged Valid .r) begin pc[1] <= r.newpc; btb.update(r.pc, r.newpc); eEpoch <= !eEpoch; end else if(decRedirect[1] matches tagged Valid .r) begin dEpoch <= !dEpoch; end exeRedirect[1] <= Invalid; decRedirect[1] <= Invalid; October 30, 2015 http://csg.csail.mit.edu/6.175

Advanced Branch Predictor BHT cannot predict very accurately Need more sophisticated schemes Global branch history taken/not token for previous branches Local branch history taken/not token for previous occurrences of the same branch Tournament branch predictor Use both global and local history Alpha 21264 October 30, 2015 http://csg.csail.mit.edu/6.175

Tournament Predictor 10-bit PC: index 1024 x 10-bit local history table 10-bit local history Index 1024 x 3-bit BHT: prediction 1 12-bit global history Index 4096 x 2-bit BHT: prediction 2 Index 4096 x 2-bit BHT: select between predictions 1, 2 “The Alpha 21264 Microprocessor Architecture” October 30, 2015 http://csg.csail.mit.edu/6.175

TAGE Branch Predictors TAGE (Tagged Geometric history length) “A case for partially tagged geometric history length branch prediction” October 30, 2015 http://csg.csail.mit.edu/6.175

Other Predictors Perceptron-based predictors Classification problem in machine learning Championship Branch Predictor Papers Slides simulation codes October 30, 2015 http://csg.csail.mit.edu/6.175

Return Address Stack Use a stack to store return address from function call Push when function is called Pop when returning from a function Instruction to return from function call JALR: rd = x0, rs1 = x1 (ra) Instruction to initiate function call JAL: rd = x1 JALR: rd = x1 October 30, 2015 http://csg.csail.mit.edu/6.175