8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow
8/16/2015\course\cpeg323-08F\Topics1b.ppt2 How to design a CPU ? Instruction-set architecture (ISA) design Function-level (RTL) design Component-level design Gate-level/switch-level design Circuit-level design
8/16/2015\course\cpeg323-08F\Topics1b.ppt3 Design Method Gate Level/circuit level:toward full CAD Register Level:CAD + heuristics/intuition ISA Level:mainly heuristic process with simulation validation
8/16/2015\course\cpeg323-08F\Topics1b.ppt4 Instruction Set Architecture Design (Microarchitecture Design-I) System-Level Design RTL Level Design (Microarchitecture Design II) Compiler Design Code Optimizer Hardware Design Switch Level Design Circuit Level design ISA Simulator System Level Simulator RTL Level Simulator Switch Level Simulator Circuit Level Simulator Arch./Compiler Design Toolset Processor Architecture Design Flow Diagram HDL (VHDL or Verilog) Code Generator
Design Levels of Abstraction 8/16/2015\course\cpeg323-08F\Topics1b.ppt5 RenIfsSetWb2H := vOR3(RenCoverUpdtIFMWb2H, vAND2(RenCrab_Data_Hi_Cx5B[31], RenCrabIfsWrEnCx5H), vAND2(RenIfsValidWb3H, vNOT(RenCrabIfsWrEnCx5H))) RenIfsSetWb2H := vOR3(RenCoverUpdtIFMWb2H, vAND2(RenCrab_Data_Hi_Cx5B[31], RenCrabIfsWrEnCx5H), vAND2(RenIfsValidWb3H, vNOT(RenCrabIfsWrEnCx5H))) mov eax, [edi] cmp eax, 4 jne label10 mov eax, [edi] cmp eax, 4 jne label10 eax ebx ecx edx CPUCPU Branch Unit I-Cache D-Cache Switch Instruction Decode Register Mapping Instruction Decode Register Mapping Int Regs FP Regs ALU FPU Address Calculation MICROARCHMICROARCH Abstract Architecture Logic CIRCUITCIRCUIT LAYOUT Concrete
8/16/2015\course\cpeg323-08F\Topics1b.ppt6 Design Levels and Component Types
8/16/2015\course\cpeg323-08F\Topics1b.ppt7 Classical ISA Level Design Method Select a prototype structure A Modify A to accommodate: - new performance demand and new technology Evaluation (ISA simulation) Repeating until satisfaction
8/16/2015\course\cpeg323-08F\Topics1b.ppt8 Overall Simulation Strategy 1. Instruction level simulator: this is used for performance evaluation at the instruction set level as well as for more detailed modeling, e.g. the pipeline and memory system. This level is also used to generate test vectors employed in lower-level simulators. 2. System level simulation: this simulator models the details of the system environment including such things as interrupts and memory management. (Virtual machine level..)
8/16/2015\course\cpeg323-08F\Topics1b.ppt9 Overall Simulation Strategy 3. RTL level: this simulator models are RTL description of the design 4. Switch level with delays: used to simulate the design mostly in components; test vectors are generated from the RTL level. 5. Circuit simulation: it is used for detailed modeling of the critical paths as well as for verification of circuits under variations in temperature, power supply, etc. (Con’d)
8/16/2015\course\cpeg323-08F\Topics1b.ppt10 Performance of Simulators # of cycles simulated per second on a host machine
8/16/2015\course\cpeg323-08F\Topics1b.ppt11 Instruction Set Architecture Simulation Execution -driven simulator Trace-driven simulator (cache simulator branch prediction simulator, etc.) Traces (e.g. memory accesses branch trace, etc.) Runtime statistics (frequencies, cycle counts, etc.) Profile information Statistics (e.g. cache behavior, branch behavior, etc.) Object file Architecture Models
8/16/2015\course\cpeg323-08F\Topics1b.ppt12 Performance Study by Simulation Develop performance model that is: - Flexible - Parameterized (via knobs) - 95% clock accurate compared to RTL - Significantly smaller than RTL Models consist of two parts: - Instruction-set simulator -> executes benchmark - Pipeline simulator -> “accountant” for clock cycles Run benchmarks, update microarchitecture accordingly Cycle of: code -> simulate -> characterize -> tune
8/16/2015\course\cpeg323-08F\Topics1b.ppt13 Revisit: How to design a CPU ? Instruction-set architecture (ISA) design Function-level (RTL) design Component-level design Gate-level/switch-level design Circuit-level design Monty Denneau : I work on everything down to and including 4. Cyclops skips (2) and goes directly to 3/4. A lot of time was spent restructuring the design to make 4 meet timing. I probably spent thousands of hours on 4. We have no 5 - ASICS provides a library of gates, latches, and memory, etc. August 28, 2007