Internals of SimpleScalar Simulators CPEG323 Tutorial Long Chen November, 2005
Outline The SimpleScalar Instruction Set Internal structure of the SimpleScalar simulator Software architecture of the simulator Some important modules About the project
The SimpleScalar Instruction Set Clean and simple instruction set architecture: MIPS + more addressing modes Bi-endian instruction set definition Facilitetes portability, build to match host endian 64-bit inst encoding facilitates instruction set research 16-bit space for hints, new insts, and annotations Four operand instruction format, up to 256 registers
Simulation Architected State
Outline The SimpleScalar Instruction Set Internal structure of the SimpleScalar simulator Software architecture of the simulator Important modules About the project
Simulator Structure
Outline The SimpleScalar Instruction Set Internal structure of the SimpleScalar simulator Software architecture of the simulator Important modules About the project
Simulator Software Architecture Interface programming style All “.c” files have an accompanying “.h” file with same base “.h” files define public interfaces “exported” by module Mostly stable, documented with comments, studying these files “.c” files implement the exported interfaces Not as stable, study these if you need to hack the functionality Simulator modules “sim-*.c” files, each implements a complete simulator core Reusable S/W components facilitate “rolling your own” System components Simulation components Additional “really useful” components
Brief Source Roadmap Start point: main.c Simulator cores sim-fast, sim-safe … Loader: loader.[c,h] Memory: memory.[c,h]Register: regs.[c,h] ISA Def: machine.def ISA routines: machine.[c,h] System Call: syscall.[c,h] Cache: cache.[c,h] Options parsing: options.[c,h]
Machine Definition File (machine.def) A single file describes all aspects of the architecture Used to generate decoders, dependency analyzers, functional components, disassemblers, appendices, etc. e.g., machine definition + ~30 line main = functional simulator Generates fast and reliable codes with minimum effort Instruction definition example: #define OR_IMPL \ { \ SET_GPR(RD, GPR(RS) | GPR(RT));\ } DEFINST(OR, 0x50, "or", "d,s,t", IntALU, F_ICOMP, DGPR(RD), DNA, DGPR(RS), DGPR(RT), DNA) disassembly template FU req’s output deps semantics opcode inst flagsinput deps operands
SimpleScalar ISA Module (machine.[hc]) Macros to expedite the processing of instructions Constants needed across simulators, for example, the size of the register file Examples: /* returns the opcode field value of SimpleScalar instruction INST */ #define MD_OPFIELD(INST)(INST.a & 0xff) #define MD_SET_OPCODE(OP, INST)((OP) = ((INST).a & 0xff)) /* inst -> enum md_opcode mapping, use this macro to decode insts */ #define MD_OP_ENUM(MSK)(md_mask2op[MSK]) /* enum md_opcode -> description string */ #define MD_OP_NAME(OP)(md_op2name[OP]) /* enum md_opcode -> opcode operand format, used by disassembler */ #define MD_OP_FORMAT(OP)(md_op2format[OP]) /* enum md_opcode -> opcode flags, used by simulators */ #define MD_OP_FLAGS(OP)(md_op2flags[OP]) /* disassemble an instruction */ void md_print_insn(md_inst_t inst, md_addr_t pc, FILE*stream);
Instruction Field Accessors
Instruction Semantics Specification
Main Loop (sim-fast.c) /* set up initial default next PC */ regs.regs_NPC = regs.regs_PC + sizeof(md_inst_t); while (TRUE) { /* maintain $r0 semantics */ regs.regs_R[MD_REG_ZERO] = 0; /* keep an instruction count */ #ifndef NO_INSN_COUNT sim_num_insn++; #endif /* !NO_INSN_COUNT */ /* load instruction */ MD_FETCH_INST(inst, mem, regs.regs_PC); /* decode the instruction */ MD_SET_OPCODE(op, inst); /* execute the instruction */ switch (op) { #define DEFINST(OP,MSK,NAME,OPFORM,RES,FLAGS,O1,O2,I1,I2,I3)\ case OP:\ SYMCAT(OP,_IMPL);\ break; #include "machine.def" default: panic("attempted to execute a bogus opcode"); } /* execute next instruction */ regs.regs_PC = regs.regs_NPC; regs.regs_NPC += sizeof(md_inst_t); } The instruction is executed in a shot here, consider the pipeline approach in the project requirement
Outline The SimpleScalar Instruction Set Internal structure of the SimpleScalar simulator Software architecture of the simulator Important modules About the project
Memory Module (memory.[hc]) Functions for reading from, writing to, initializing and dumping the contents of the main memory
Functions to initialize the register files and dump their contents Access non-speculative register directly, e.g., regs_R[5] = 12 e.g., regs_F.f[4] = 23.5; Floating point register file supports three views integer word single-precision, double-precision /* floating point register file format */ union regs_FP_t { md_gpr_t l[MD_NUM_FREGS];/* integer word view */ md_SS_FLOAT_TYPE f[SS_NUM_REGS];/* single-precision FP view */ SS_DOUBLE_TYPE d[SS_NUM_REGS/2];/* double-precision FP view */ }; /* floating point register file */ extern union md_regs_FP_t regs_F; /* (signed) hi register, holds mult/div results */ extern SS_WORD_TYPE regs_HI; /* (signed) lo register, holds mult/div results */ extern SS_WORD_TYPE regs_LO; /* program counter */ extern SS_ADDR_TYPE regs_PC; Register Module (reg.[hc])
Loader Module (loader.[hc])
Other Modules cache.[hc]: general functions to support multiple cache types (you may pay attention to this part for the coming project) misc.[hc]: numerious useful support functions, such as warn(), info(), elapsed_time() options.[hc]: process command-line arguments sim.h: a few extern variable declarations and function prototypes
Outline The SimpleScalar Instruction Set Internal structure of the SimpleScalar simulator Software architecture of the simulator Important modules About the project
Hints for the Phase 2 sim-safe is the simplest simulator In the main loop of the original code, there are three steps to fetch inst, decode inst, and execute inst. Although this is not a pipeline, you can start with it In this phase, you only need to take care of the data dependency among instructions, no need to consider the execution latency Therefore, all instructions take 5 cycles to complete in the 5- stage pipeline (each stage takes 1 cycle) if there is no need to stall the pipeline because of dependencies But, keep in mind that this is not true in the real world. For example, load may need 30 cycles to complete in the real case, while add may need 5 cycles. You are going to deal with the timing problem in the coming project
What are Expected? In your project report How to formulate the problem? What is your detailed design? How to distribute the workload among members? Any problems with the design during the implementation? How did you improve it? What is the result? Any things could be done better? Copy of the source code you added/modified with proper comments (printing the entire file is a waste of tree and time) me the source code you added/modified, with a short description
Have fun with the simulator