Timing Model of a Superscalar O-o-O processor in HAsim Framework

Timing Model of a Superscalar O-o-O processor in HAsim Framework
Murali Vijayaraghavan

What is HAsim Framework to write software-like timing models and run it on FPGAs Software timing models are inherently sequential – hence slow Parallelism is achieved by implementing the timing model on FPGAs

(stalls, mispredicts, etc)
HAsim contd. Functional Partition correct execution (multiply, divide, etc) Timing Partition model time (stalls, mispredicts, etc) requests responses Functional partition == ISA Timing partition == micro-architecture

Functional Partition TOK GEN FET DEC EXE MEM LCO GCO RegState MemState
FetAlg DecAlg ExeAlg MemAlg LCOAlg GCOAlg RegState MemState

Model cycle vs FPGA cycle
Functional simulator can take any number of FPGA cycles for an operation So there must be an explicit mechanism to monitor the ticks of the processor being modelled

APorts – monitoring ticks
Each module in timing partition is connected with each other using APorts A clock tick conceptually begins when the module has read from every input APort and ends when it writes to every output APort But the tick localized to each port

MIPS R10-k specs 64-bit processor Out-of-order execution Superscalar
FetchWidth – 4 CommitWidth – 4 2 ALUs 1 Load/Store unit 1 FPU

Timing model design Functional partition operates only on one instruction at a time But timing model time-multiplexes multiple operations to operate on more than one instruction at a time

Timing Model top-level design
Exec Results Free Buffer IntQ buffer left PC at Mispredict FU Ops Fetch Decode/Dispath AddrQ buffer left Issue Execute Predicted PC 4 Issue 4 Tokens Commit 4 Commit Mem Exec Token Fetch Decode LCO GCO

Decode/Dispath Module
4 Commit PC at Mispredict Branch/JR Pred ROB Predicted PC Update Update from exec Busy RegFile Insert Decode IntQ Free Count Inst Buffer (8) 4 Inst AddrQ Free Count 4 issue

Issue Module To 2 ALUs IntQ (O-o-O) ScoreBoard 4 Inst AddrQ (In Order)
To Load Store

Differences of my timing model from R-10k
SMIPS ISA – no floating point ops 32-bit registers and addressing No delay slot One extra cycle in branch mispredict JR and JALR has to go through the Integer Q

Reasons for timing differences
Currently functional partition gives only information about branches. So JR and JALR’s address can be got only after execution of JR or JALR I didn’t implement the branch cache which eliminates the extra cycle in branch mispredict

Simulation results Simulated SMIPS v2 ADDUI test case
Took 239 FPGA cycles to simulate 7 model cycles – must look into this number as the “bottleneck” is the instruction queue, which takes 7 * 21 cycles = 147 cycles

Miscellaneous Lines of code for timing model ~ 1300
Compared to ~1200 for a simple SMIPS processor in Lab2, excluding caches

Timing Model of a Superscalar O-o-O processor in HAsim Framework

Similar presentations

Presentation on theme: "Timing Model of a Superscalar O-o-O processor in HAsim Framework"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Timing Model of a Superscalar O-o-O processor in HAsim Framework

Similar presentations

Presentation on theme: "Timing Model of a Superscalar O-o-O processor in HAsim Framework"— Presentation transcript:

Similar presentations

About project

Feedback