HASim Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind* Massachusetts Institute of Technology* Computer Science and Artificial Intelligence Lab Cambridge, MA Intel Corporation VSSAD† Hudson, MA
HASim - Why? Micro-architectural Simulations are Important Better estimates for expected outcomes SW Simulations are slow to run 100s of KIPs HW “simulations” take a long time to design
Underlying Beliefs Modeling something is generally easier than designing it Don't need to be totally faithful to design for what you need It's easy to make modeling mistakes Need to insert checks to assure you didn't cheat Appropriate partitioning improves reuses Split computational aspects from timing
HASim – What? HASim is a partitioned hardware simulation framework Two Partitions: Functional (FP) – Executes instructions Timing (TP) – Responsible for determining the timing of the emulated machine
HASim: The Picture Timing Partition Token Gen Dec Exe Mem LCom GCom Fet Functional Partition Memory Bypassing Unit RegFile
Functional Partition Zoom – In TP Request to do Instruction i Response to TP’s Request <Token> <Token, DependencyInfo> Info From Prev. Stage <Token, Inst> Token Table Information to Next Stage <Token, DecodedInst> Decoder Unit To MapTable (in BypassUnit)
Functional Partition - Execute <Token> <Token, Result Value> <Token, DecodedInst> Token Table <Token, ExecedInst> Execute To RegFile (in BypassUnit)
Automated Checks We'd like our model to: Obey Causality of data usage No reading values before they're created Meet expected times for different stages e.g. Decode of an instruction completes takes at least 1 cycle Decode should not take more than two cycles Want very these very simple checks Let's have the FP verify these!
Verifying Casuality All execution interactions to the functional model are provided Annotate all data with emulated clock it was created on FP checks time on accesses of data
Leveraging FP structure for Timing Sometimes the best way to model something is to just make it Use the target designs cache structure as the FP's cache structure Can just measure the number of target ticks May need to record some more information (where misses occurred) to get appropriate timing
Similar Ideas - FAST FAST – similar underlying beliefs Differences: SW vs. HW functional partitions decoupled partitions vs. tight coupling some additional correctness checks needed Not clear which approach is more effective
Similar Ideas - UNUM UNUM – Another parameterized HW framework Much more emphasis on HW quality and structure Much more work to generate “Believable” low-level values Aimed later in the design selection cycle
Current Progress Initial functional partition Singlescalar OOO design Simple RISC ISA Physical Reg File Fast branch rewinds Simple Pipeline Timing Partition
Future Progress Porting a real ISA to design x86 (w/ µops) More complicated timing models Reorder Buffer designs Large Cache simulations
{ndave, pellauer, arvind}@csail.mit.edu Thanks! {ndave, pellauer, arvind}@csail.mit.edu emer@intel.com