Sunshine Slam Khian Hao Lim Haywood Ho Soe Myint Leo Ting Ka Hou Chan
Overview Datapath of 10 stages deep pipeline Cache Branch prediction and jump target prediction Critical Path Xilinx Tools, Testing Methodology Performance Current status, Conclusion
Datapath I IF0IF1 IF2 Next PC logic Inst cache Branch prediction PC
Datapath II IDFOEX1EX2 Ctrl Reg file decode Fwd logic Mux A Mux B ALU Branch Verify FO muxes
Datapath III ME1ME2WB Data cache mux FO muxes monitor statistics FO muxes
Cache Architecture BlockRams Write Buffer = = ME1EX2 ME2 SDRAM Processor datapath tags data tag data Cache Meister 2-way set-associativity Random cache-line replacement policy Cache miss detected in ME2 Write buffer congestion detected in ME1
Branch, Jump Prediction
Critical Path PathLogic Delay (ns)Route Delay(ns) Total Delay(ns) Write Buffer (28.7%) (71.3%) Stalling Logic (19.8%) (80.2%) ALU7.177 (36.5%) (63.5%) Forwarding logic (35.1%) (64.9%) Branch Verifier (37.7%) (62.3%) Branch Predictor (24.7%) (75.3%) Forwarding muxes (22.8%) (77.2%)13.824
Xilinx Tools Read up on tutorials on the Xilinx website to become more familiar with the tools Added timing constraints to the clock and other paths Critical path shortened (37ns 25 ns) after adding constraints and constraining fanout Guide design files
Testing Methodology Black-box tests for each module Verify memory controller functionality on board Replaced caches with block RAMs Tested entire processor in simulation Made changes to help alleviate clock skew problems on board “Shadow” register file so we could more easily debug on board
Performance Test Programs CPIBranch Prediction correct % Quick sort2.264% Extra % Base % Measurement Results How did we measure our processor’s performance? Add a Statistics Module Count the numbers of right or wrong predicted branches I Cache and D Cache WB stage Stalling Logic Statistic Module Count the total number of cycles Count the number of valid instructions executed CPI = total cycles / number of valid instructions executed Collect data from different modules In the branch predictor….
Lessons/Evaluation/ Further Improvements Simpler write buffer design Use Smaller write buffer to reduce logic Use random replacement instead of FIFO Pipeline the stalling logic Do the necessary computation in IF2 stage and then decide whether to stall in ID stage Pipeline Branch Verifier Do the computation in EX1 or FO stage and then compare or look up table in the next stage IF2ID FOEX1 EX2