Computer Architecture Lab at Building a Synthesizable x86 Eriko Nurvitadhi, James C. Hoe, Babak Falsafi S IMFLEX /P ROTOFLEX
June 22, Motivation Build synth x86 func model for prototyping most widely-used ISA Intel won’t give out theirs Problem: a very complicated ISA many instructions 482 instructions total (**ADD has 14 variations) many individually complicated instructions PUSHAD – push all GP registers to stack many under-specified instructions LOADALL inst; BCD operation flag updates Also must be maintainable & extensible return on investment
June 22, Overcoming Complexity 4 key ingredients in our approach working SW simulator as design spec simplified multi-cycle datapath high-level HDL HW-SW co-simulation validation & evaluation What we have today... an x86 functional model in Bluespec all real-mode general-purpose insts includes I/O instructions! boots FreeDOS OS in co-simulation testbench synthesizes to 85% of a Virtex II Pro 70 FPGA Max 10 MIPS (based on synthesis + simulation)
June 22, Outline Introduction Our Approach Status and Results Discussions and Future work
June 22, Functional View of an ISA ISA = architectural states + instructions instruction = set of alternate behaviors e.g., due to different addressing modes x86 has 482 insts but ~1000 behaviors behavior = sequence of actions that read & alter states functional model ACT beh_m ACT beh_2beh_1 Inst_1 Inst_2 Inst_n
June 22, SW x86 Sim as ISA Spec Simulator source code = precise and executable design spec We use Bochs ( open-source code structure fits our high-level ISA view i.e.,explicit architecture state declaration one instruction behavior C++ function (Essentially) complete x86 functionalities simulate complete PC system run various OSs (e.g., Linux, Win XP) support 386 through Pentium Pro
June 22, FU Multi-cycle Implementation Fetch Start decoder FU arch, aux states x86 functional model FU Top-level view Mem accesses I/O operations Finish DecodeExecuteCommit Sequential, multi-cycle execution
June 22, Bluespec Design Capture Explicit state declaration x86 architectural states auxiliary simulation states used by Bochs Predicated atomic rules one rule one action in our ISA view Maintainability & extensibility new behavior: add rules changing behavior: add/modify rules Optimizations (low-level) reduce logic: reuse + combine rules reduce critical path delay: split rules
June 22, HW-SW co-simulation for Validation and Evaluation Virtually “plug-in” our model into a PC execute Bochs to provide reference behavior simulate RTL along side the simulated Bochs PC For validation and performance (CPI) eval == Bochs CPU MEMI/Os CPU RTL Bochs MEMI/Os CPU RTL Performance EvaluationValidation
June 22, Bochs src code Bluespec x86 Verilog x86 C++ x86 Manual coding Bluespec compilation C++ conversion (Verilator) Bochs simulation Workloads on Bochs Traces Co-simulation Co-Simulation Testbench Validation and performance evaluation results Automated
June 22, Outline Introduction Our Approach Status and Results Discussions and Future work
June 22, Implementation Progress Implemented ISA subset all real-mode general purpose instructions 166 insts, 369 inst behaviors compared to complete x86 482 insts, ~1000 inst behaviors Synthesis convert Bluespec to synthesizable Verilog Xilinx ISE 7.1, Virtex II Pro 70 (FPGA on BEE2) results: 98 MHz, 28K Slices (85% util)
June 22, Co-simulation Results Validation validated our model w/ FreeDOS bootup traces tested first 140M dynamic instructions exercised 183 inst behaviors Performance Evaluation also with FreeDOS bootup traces
June 22, A Complete x86? To finish the x86 model can be done, but takes effort consumes a lot of FPGA resources Do we really need all of it? a workload uses only a subset of the ISA some insts used more often than others parts of ISA is never or rarely used P ROTOFLEX migration combine FPGA & simulation model necessary subset in HW, the rest in SW
June 22, Future Work Short-term (Fall’06) implement protected-mode support validate/evaluate w/ more workloads Linux, SPEC-CPU, commercial apps (DB2) deployment on the BEE2 board Long-term full-system prototype execution architectural exploration Computer Architecture Lab at S IMFLEX /P ROTOFLEX