Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT
Hasim2 Overview Goal –Produce compelling evidence for architecture ideas Requirements –Cycle accurate simulation –Representative simulation length –Software development (often) Current approach –Mostly software simulation (10 KHz to 1 KHz) New approach –Build a performance model in an FPGA
Hasim3 FPGA-based approaches Prototyping –Build a logically isomorphic representation of the design Modeling –Build a performance simulation in gates Hybrids –Build something that is partially a prototype and partially a model
Hasim4 Recreate Asim in hardware Modularity Inter-module communication Functional/Timing Partitioning Modeling Utilities
Hasim5 Why modularity? Speed of model development Shared components between products Reuse across generations Encourages isomorphism to design Improved fidelity Facilitates speed/fidelity trade-offs Architectural experimentation Factorial development and evaluations Sharing
Hasim6 ASIM Module Hierarchy S MCNDRXCWFB
Hasim7 ASIM Module Selection B B B B S MCN DRXCWF B B
Hasim8 DRXCWF DRXCWF S MCN CMN Module Selection S B B B B B B
Hasim9 Module Replacement B B B B S MCN DRXCWF B X
Hasim10 (H)ASIM Module Hierarchy
Hasim11 Communication C DRXCWF NN
Hasim12 Named connections SD A-outA-in
Hasim13 Model and FPGA Cycles Module A Module B Port A B A B Port
Hasim14 Functional/Timing Decomposition ISA semantics Platform semantics Micro-architecture Timing Partition Functional Partition Fetch(PC) … Instruction Simplifies timing model Amortize functional model design effort over many models Can be pipelined for performance Can be FPGA-friendly design Can be split across hardware and software
Hasim15 phases Fetch instruction Speculatively execute instruction Read memory * Speculatively write memory * (locally visible) Commit or Abort instruction Write memory * (globally visible) * Optional depending on instruction type
Hasim16 Execution in phases FDXRCFDXWCWFDXC Assertion: All data dependencies can be represented in these phases FDXRA FDXXCW
Hasim17 HASim: Partitioning Overview Token Gen Dec ExeMemLCom GComFet Timing Partition Memory State Register State RegFile Functional Partition
Hasim18 Common Infrastructure Modules Inter-module communication Statistics gathering Event logging Debug Tracing Simulation control …
Hasim19 Bluespec (Asim-style) module module [HAsim_module] mkCache#() (Empty); Port#(Addr) req_port <- mkSendPort(‘a2cache’); Port#(Bool) resp_port <- mkRecvPort(‘cache2a’); TagArray tagarray <- mkTagArray(); rule cycle(True); Maybe#(Addr) mx = req_port.get(); if (isValid(mx)) resp_port.put(tagarray.lookup(validValue(mx))); endrule endmodule
Hasim20 Bluespec (Asim-style) submodule module mkTagArray(TagArray); RegFile#(Bit#(12),Bit#(4)) tagArray<- mkRegFileFull(...); method Bool lookup(Bit#(16) a); return (tagArray.sub(getIndex(a)) == getTag(a)); endmethod function Bit#(4) getTag(Address x); return x[15:12]; endfunction function Bit#(12) getIndex(Address x); return x[11:0]; endfunction endmodule
Hasim21 Support functions - stats Module Stat Counter Module Stat Counter Module Stat Counter Stat Dumper module mkCache#(...) (Empty);... cache_hits <- mkStat(...);... hit=tagarray.lookup(...); if (hit) cache_hits.increment(); endif... endmodule
Hasim22 2Dreams
Hasim23 Support functions - events Module Event Reg Module Event Reg Module Event Reg Event Dumper module mkCache#(...) (Empty);... cache_event <- mkEvent(...);... hit=tagarray.lookup(...); cache_event.report(hit);... endmodule
Hasim24 Support functions – global controller Module Controller Module Controller Module Controller Global Controller module mkCache#(...) (Empty);... ctrl <- mkCntrlr(...);... rule (ctrl.run())... endrule endmodule
Hasim26 FPGA-based prototype Prototyping Catch-22…
Hasim27 Module Instantiation U DRXCWF MCN C DRXCWF M C DRXCWF
Hasim28 Factorial Coding/Experiments SC S MCN SM RC S MCN SM SC S MCN RM RC S MCN RM
Hasim29 HAsim: Current status - models Simple RISC functional model operating – Simple RISC ISA – Pipelined multi-phase instruction execution – Supports speculative OOO design Physical Reg File and ROB Small physically addressed memory Fast speculative rewinds Instruction-per-cycle (APE) model –Runs simple benchmarks on FPGA Five stage pipeline –Supports branch mis-speculation –Runs simple benchmarks (in software simulation) X86 functional model architecture under development
Hasim30 Connections Implement Ports foo bar foo baz PM (Module Tree w. Connections) PM (Hardware Modules w. Wrappers) bar foo baz Implemented via connections.
Hasim31 Timing Model Resources (Fast) OOO, branch prediction, three functional units, 32KB 2-way set associative ICache and DCache, iTLB, dTLB 2142 slices (15% of a 2VP30) 21 block RAMs (15% of a 2VP30) Configurable cache model 32KB 4-way set associative cache with 16B cache-lines –165 slices (1% of a 2VP30) –17 block RAMs (12% of a 2VP30) 2MB 4-way set-associative cache with 64B cache-lines –140 slices (1% of a 2VP30) –40 block RAMs (29% of a 2VP30) Current FPGAs (4VFX140) 142,128 slices 552 block RAMs 2 PowerPCs