6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science…

Slides:



Advertisements
Similar presentations
Dynamic Branch Prediction
Advertisements

Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Goal: Reduce the Penalty of Control Hazards
Dynamic Branch Prediction
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
Arvind and Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T. Branch Prediction.
Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
6.375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016http://csg.csail.mit.edu/6.375T04-1.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Dynamic Branch Prediction
Realistic Memories and Caches
Prof. Hsien-Hsin Sean Lee
CS203 – Advanced Computer Architecture
Edexcel GCSE Computer Science Topic 15 - The Processor (CPU)
Control Hazards Constructive Computer Architecture: Arvind
Realistic Memories and Caches
Computer Architecture Advanced Branch Prediction
Tutorial 7: SMIPS Epochs Constructive Computer Architecture
Morgan Kaufmann Publishers
PowerPC 604 Superscalar Microprocessor
CS252 Graduate Computer Architecture Spring 2014 Lecture 8: Advanced Out-of-Order Superscalar Designs Part-II Krste Asanovic
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Samira Khan University of Virginia Nov 13, 2017
Chapter 4 The Processor Part 4
Multistage Pipelined Processors and modular refinement
Morgan Kaufmann Publishers The Processor
TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble
Realistic Memories and Caches
Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor
Lab 4 Overview: 6-stage SMIPS Pipeline
Ka-Ming Keung Swamy D Ponpandi
Control Hazards Constructive Computer Architecture: Arvind
Krste Asanovic Electrical Engineering and Computer Sciences
Dynamic Branch Prediction
Multistage Pipelined Processors and modular refinement
Realistic Memories and Caches
Branch Prediction: Direction Predictors
Branch Prediction: Direction Predictors
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
Lecture 20: OOO, Memory Hierarchy
Lecture 20: OOO, Memory Hierarchy
Pipelined Processors Arvind
November 5 No exam results today. 9 Classes to go!
Control Hazards Constructive Computer Architecture: Arvind
Pipelined Processors Constructive Computer Architecture: Arvind
Branch Prediction: Direction Predictors
Adapted from the slides of Prof
Control Hazards Constructive Computer Architecture: Arvind
Modular Refinement Arvind
So far we have dealt with control hazards in instruction pipelines by:
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 12 – Branch Prediction and Advanced Out-of-Order Superscalars.
So far we have dealt with control hazards in instruction pipelines by:
Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture
Ka-Ming Keung Swamy D Ponpandi
Caches & Memory.
Presentation transcript:

6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science… and Comic Sans) October 28, 2016 http://csg.csail.mit.edu/6.175

Agenda Epochs: a review Debugging your processor ft. Piazza Caches: a primer October 28, 2016 http://csg.csail.mit.edu/6.175

Review: 1-bit Distributed Epochs Delay: 0 cycle feEp = 0 eEp = 0 fdEp = 0 dEp = 0 Delay: 100 cycles Inst1 redirect, ieEp = 0 deEp = 1 Inst2 Fetch PC Decode f2d d2e Execute ... Decode redirects Inst1 (ieEp = idEp = 0) Execute redirects Inst1 Correct-path Inst2 (ieEp = 1, idEp = 0) issues Execute redirects Inst2 Inst1 redirect arrives at Fetch (ieEp == feEp) change PC to a wrong value October 28, 2016 http://csg.csail.mit.edu/6.175

Review: Unbounded Global Epochs redirect PC eEpoch Redirect dEpoch redirect PC miss pred? miss pred? Execute d2e Fetch PC Decode f2d ... Both Decode and Execute can redirect the PC Execute redirect should never be overruled Global epoch for each redirecting stage eEpoch: incremented when redirect from Execute takes effect dEpoch: incremented when redirect from Decode takes effect Initially set all epochs to 0 October 28, 2016 http://csg.csail.mit.edu/6.175

Review: Branch History Table (BHT) Instruction Fetch PC from Fetch Opcode offset Branch? Target PC + k 2k-entry BHT, 2 bits/entry BHT Index At the Decode stage, if the instruction is a branch then BHT is consulted using the pc; if BHT shows a different prediction than the incoming ppc, Fetch is redirected Taken/¬Taken? 4K-entry BHT, 2 bits/entry, ~80-90% correct direction predictions October 28, 2016 http://csg.csail.mit.edu/6.175

Review: Two-Level Branch Predictor Pentium Pro uses the result from the last two branches to select one of the four sets of BHT bits (~95% correct) Taken/¬Taken? k Fetch PC Four 2k, 2-bit Entry BHT Shift in Taken/¬Taken results of each branch 2-bit global branch history shift register October 28, 2016 http://csg.csail.mit.edu/6.175

Review: Tournament Predictor “The Alpha 21264 Microprocessor Architecture” 10-bit PC: index 1024 x 10-bit local history table 10-bit local history Index 1024 x 3-bit BHT: prediction 1 12-bit global history Index 4096 x 2-bit BHT: prediction 2 Index 4096 x 2-bit BHT: select between predictions 1, 2 October 28, 2016 http://csg.csail.mit.edu/6.175

Debugging Your Processor October 28, 2016 http://csg.csail.mit.edu/6.175

Unsupported Instruction Processor initialized? (csrf.started) What could be redirecting your PC? Faulty branch address calculation? BTB? (Lab 6 hint!) Bad instruction? (unlikely for this course) October 28, 2016 http://csg.csail.mit.edu/6.175

Processor Hangs Rules conflict? Did you size pipeline FIFOs correctly? Check schedule (option “-show-schedule”) Use $display() statements to diagnose Did you size pipeline FIFOs correctly? Are FIFOs being drained? October 28, 2016 http://csg.csail.mit.edu/6.175

Incorrect Behavior Which test fails? Where is it in the dump? Do you have a log? If your rules don’t fire, temporarily make simpler rules October 28, 2016 http://csg.csail.mit.edu/6.175

Demo: our code src/TwoStage.bsv rule doFetch (csrf.started); // fetch Data inst = iMem.req(pcReg[0]); // Addr predPc = btb.predPc(pcReg[0]); Addr predPc = pcReg[0]; // always predict PC to be next PC ... endrule [qmn@vlsifarm] $ ./run_asm.sh twostage ... -- assembly test: lw -- ERROR: Executing unsupported instruction at pc: 00001000. Exiting ^C October 28, 2016 http://csg.csail.mit.edu/6.175

Demo: the log scemi/sim/logs/lw.log ... Cycle 5 ---------------------------------------------------- Fetch: PC = 00000208, inst = 0000a183, expanded = lw r 3 = [r 1 0x0] Execute finds misprediction: PC = 00000208 Fetch: Mispredict, redirected by Execute Cycle 6 ---------------------------------------------------- Fetch: PC = 00001000, inst = 00ff00ff, expanded = unsupport 0x00ff00ff Execute: Kill instruction October 28, 2016 http://csg.csail.mit.edu/6.175

Demo: the dump programs/assembly/build/assembly/dump/lw.dump Disassembly of section .text: 00000200 <_start>: 200: 000010b7 lui x1, 0x1 204: 00008093 mv x1, x1 208: 0000a183 lw x31, 0(x1) # 1000 ... Disassembly of section .data: 00001000 <begin_signature>: 1000: 00ff 0xff 1002: 00ff 0xff October 28, 2016 http://csg.csail.mit.edu/6.175

Demo: back to the code What sets eInst.addr? src/TwoStage.bsv rule doExecute (csrf.started); if (eInst.mispredict) begin $display("Execute finds misprediction: PC = %x”, f2e.pc); exeRedirect[0] <= Valid(ExeRedirect{ pc: f2e.pc, nextPc: eInst.addr }); end endrule What sets eInst.addr? What happens to exeRedirect[0]? October 28, 2016 http://csg.csail.mit.edu/6.175

Caches October 28, 2016 http://csg.csail.mit.edu/6.175

Multistage Pipeline Register File PC Decode Execute Inst Data Memory fEpoch Register File eEpoch redirect e2c PC Decode Execute nap d2e Inst Memory Data Memory scoreboard The use of magic memories (combinational reads) makes these designs unrealistic October 28, 2016 http://csg.csail.mit.edu/6.175 17

Magic Memory Model Reads and writes are always completed in one cycle RAM ReadData WriteData Address WriteEnable Clock Reads and writes are always completed in one cycle a Read can be done any time (i.e. combinational) If enabled, a Write is performed at the rising clock edge (the write address and data must be stable at the clock edge) In a real DRAM the data will be available several cycles after the address is supplied October 28, 2016 http://csg.csail.mit.edu/6.175

holds frequently used data Memory Hierarchy Small, Fast Memory SRAM CPU RegFile Big, Slow Memory DRAM holds frequently used data size: RegFile << SRAM << DRAM latency: RegFile << SRAM << DRAM bandwidth: on-chip >> off-chip On a data access: hit (data Î fast memory)  low latency access miss (data Ï fast memory)  long latency access (DRAM) why? Due to cost Due to size of DRAM Due to cost and wire delays (wires on-chip cost much less, and are faster) October 28, 2016 http://csg.csail.mit.edu/6.175

Two biggest problems in CS Cache invalidation How to inform caches of stale data Naming things Off-by-one errors October 28, 2016 http://csg.csail.mit.edu/6.175