1 Co-simulation Slides from: - Tony Givargis, Irvine, IC253 - Rabi Mahapatra, Texas A&M University - Sharif University.

Slides:



Advertisements
Similar presentations
I/O Organization popo.
Advertisements

System Integration Verification and Validation
PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
CS-334: Computer Architecture
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
MotoHawk Training Model-Based Design of Embedded Systems.
LOGO HW/SW Co-Verification -- Mentor Graphics® Seamless CVE By: Getao Liang March, 2006.
Mahapatra-Texas A&M-Fall'001 Cosimulation II Cosimulation Approaches.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Co-simulation CPSC 617 Hardware-Software Codesign of Embedded Systems.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
1: Operating Systems Overview
Co-simulation CPSC 617 Hardware-Software Codesign of Embedded Systems.
Mahapatra-TexasA&M-Spring20031 Introduction to co-simulation CPSC Hardware-Software Codesign of Embedded Systems.
Ritu Varma Roshanak Roshandel Manu Prasanna
System-Level Verification –a Comparison of Approach Ray Turner Rapid Systems Prototyping, IEEE International Workshop on.
Chapter 1 and 2 Computer System and Operating System Overview
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
© Copyright Alvarion Ltd. Hardware Acceleration February 2006.
INPUT/OUTPUT ARCHITECTURE By Truc Truong. Input Devices Keyboard Keyboard Mouse Mouse Scanner Scanner CD-Rom CD-Rom Game Controller Game Controller.
Digital signature using MD5 algorithm Hardware Acceleration
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Parallelism Processing more than one instruction at a time. Pipelining
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
A Definitive View Of Components by RobRenfrew. . The following information has been obtained from and is being used for educational.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
집적회로 Spring 2007 Prof. Sang Sik AHN Signal Processing LAB.
1 H ardware D escription L anguages Modeling Digital Systems.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
The Macro Design Process The Issues 1. Overview of IP Design 2. Key Features 3. Planning and Specification 4. Macro Design and Verification 5. Soft Macro.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Winter-Spring 2001Codesign of Embedded Systems1 Methodology for HW/SW Co-verification in SystemC Part of HW/SW Codesign of Embedded Systems Course (CE.
BridgePoint Integration John Wolfe / Robert Day Accelerated Technology.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
CH10 Input/Output DDDData Transfer EEEExternal Devices IIII/O Modules PPPProgrammed I/O IIIInterrupt-Driven I/O DDDDirect Memory.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Computer Architecture 2 nd year (computer and Information Sc.)
1 Hardware/Software Co-Design Final Project Emulation on Distributed Simulation Co-Verification System 陳少傑 教授 R 黃鼎鈞 R 尤建智 R 林語亭.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
HW-SW Co-Simulation 王甦群 R Graduate Institute of Electrical Engineering National Taiwan University July 3, 2003.
© 2000 Morgan Kaufman Overheads for Computers as Components Host/target design  Use a host system to prepare software for target system: target system.
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
IT3002 Computer Architecture
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Chapter 11 System-Level Verification Issues. The Importance of Verification Verifying at the system level is the last opportunity to find errors before.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
TOPIC : Types of Simulation UNIT 1 : Modeling Module 1.5 Simulation.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
 System Requirement Specification and System Planning.
CPSC 617 Hardware-Software Codesign of Embedded Systems
Computer Hardware What is a CPU.
Microcomputer Architecture
Introduction to cosynthesis Rabi Mahapatra CSCE617
Agenda Why simulation Simulation and model Instruction Set model
Presentation transcript:

1 Co-simulation Slides from: - Tony Givargis, Irvine, IC253 - Rabi Mahapatra, Texas A&M University - Sharif University

2 Verification via Simulation ~1 lifetime RTL ~1.2 years Bus functional (system-level) ~12 years Cycle accurate (system-level) ~1.4 months Behavior (system-level) ~4 days Emulator ~1 Millennium Gate-level ~1 day10 -1 FPGA 1 hour1Real-time Verification TimeRelative-speedAbstraction

3 Verification via Simulation Exhaustive simulation –Very slow (previous slide) –Environment modeling –Black box approach Partial simulation –May not catch all errors 1984, Pentium fdiv error –Test-vector generation –Slow! –Black box approach System Under Test Test-vector Generator Output Monitor Pass/fail Test-bench

4 Verification via Simulation Stop/start simulation at any time Set data values Examine system/environment values at any time Can step through small intervals (i.e., 500 nanoseconds) Simulation setup time (i.e., could spend more time modeling environment than system) Models likely incomplete Simulation speed much slower than actual execution

5 Abstraction levels Event driven simulation:(gate level simulation) –Most accurate as every active signal is calculated for every device during the clock cycle as it propagates –Each signal is simulated for its value and its time of occurrence –Excellent for timing analysis and verify race conditions –computation intensive and hence very slow Cycle-based simulation: –Calculate the state of the signals at clock edge(0 or 1) –suitable for complex design that needs large number of tests –10 times faster than event driven simulation, 20% area efficient

6 Abstraction levels Data-Flow Simulator –Signals are represented as stream of values without notion of time. Functional blocks are linked by signals. Blocks are executed when signals present at the input. –Scheduler in the simulator determines the order of block executions. –High level abstraction simulation used in the early stages of verification, typically to check the correctness of the algorithms.

7 Overcoming Simulation Problems Reduce amount of real time simulated –1 msec execution instead of 1 hour 0.001sec * 10,000,000 = 10,000 sec = 3 hours –Reduced confidence 1 msec of cruise controller operation tells us little Faster simulator –Emulators Special hardware for simulations –Less precise/accurate simulators Exchange speed for observability/controllability

8 Overcoming Simulation Problems Don’t need gate-level analysis for all simulations –Don’t care what happens at every input/output of each logic gate –Simulating RT components ~10x faster –Cycle-based simulation ~100x faster Accurate at clock boundaries only No information on signal changes between boundaries Even faster if using instruction-set simulators –Ideal for processors

9 HW/SW Co-Simulation Software is traditionally fully tested after hardware is fabricated => long TTM Integrating HW and SW earlier in the design cycle => better TTM Co-simulation involves –Simulating a processor model along with custom hw (usually described in HDL)

10 High-level Co-simulation Functional (untimed) simulation allows one to: –check functional (partial) correctness, by generating inputs and observing outputs –debug the design, by easy access to internal states High-level (timed) co-simulation allows one to check: –feasibility analysis for specification –hardware/software partitioning –architecture selection (CPU, scheduler,...) Cannot be used to validate the final implementation + need a much more detailed model of HW and SW architecture

11 HW/SW Co-Simulation Variety of simulation approaches exist –From very detailed (e.g., gate-level model) –To very abstract (e.g., instruction-level model) Simulation tools evolved separately for hardware/software –Software: typically with instruction-set simulator (ISS) –Hardware: typically with models in HDL environment Integration of GPP/SPP on single IC creating need for merging co-simulation tools

12 HW/SW Co-Simulation Simple/naive way –HDL model of microprocessor runs system software –HDL models of specific-purpose processors –Integrate all models Hardware-software co-simulator –ISS model of microprocessor runs system software –HDL model of specific-purpose processors –Create communication between simulators –Simulators run separately except when transferring data

13 HW/SW Co-Simulation Heterogeneous co-simulation environments (C-VHDL or C-Verilog) –RPC or another form of inter-process communication between HW and SW simulators –High overhead due to high data transmission between the simulators

14 Co-simulation methods (contd) Heterogeneous co-simulation Network different type of simulators together to attain better speed. Claims to be actual co-simulation strategy as it affords better ability to match the task with the tool, simulates at the level of details. –Synopsis’s Eaglei: let hw run in many simulators, sw on native PC/workstation or in instruction-set-simulator (ISS). Eaglie tool interfaces all these. HW SW

15 Heterogeneous co-simulation Homogenous/Heterogenous Product SW ISS (optional) compute Co-sim glue logic Product SW HW Implementation VHDL Verilog Simulation algorithm Event Cycle Dataflow Simulation Engine PC Emulator

16 Heterogeneous co-simulation How about performance? –Complex enough to describe any situation –Since software is not running at hardware simulation speed, a better performance will be obtained. –If target CPU is not PC, you may use cross compiler –When software runs directly on PC/WS, runs at the speed of WS –When software can not run directly as processes on WS, you need instruction set simulator ( ISS interprets assembly language at instruction level as long as CPU details are not an issue) ISS usually runs at 20% of the speed of actual or native processes.

17 Hardware density of heterogeneous simulation How much time software accesses hardware? Hardware density depends on applications and with in an application. In loosely coupled CPU system, the block responsible for hardware initializations has 30% instructions to access the hardware. In tightly coupled system, every memory reference could go through simulated hardware. In general hardware density is important for simulation speed. The base hardware and tools that communicate between the heterogenous environment can contribute to the speed too. If simulation is distributed (it often happens these days), the network bandwidth, reliability and speed matters too

18 Emulation Special simulation environment with hardware –runs whole design –expensive –10% of real time –FPGA arrays may be the hardware –allow designers of large products to find a class of problem that cannot be found in simulation –can attach to real devices (router using Quickturn's Ethernet SpeedBridge could route real network traffic)

19 Emulation Architectural simulators overlook hardware complexity and lack accuracy Integration of HDL models with architecture level simulator is pretty slow Best solution is to implement the Subsystem under Test in FPGA and integrate this with the architecture level simulator

Emulation - How it fits Simulator HDL Description Synthesize Emulation Simulator FPGA/ASIC

21 Strategy Simulation speed: Degrades when real components replace the functional blocks. The simulation speed depends on simulation engine, the simulation algorithm, the number of gates in the design, and whether the design is primarily synchronous or asynchronous Low cost cycle based simulation is a good compromise. Since it can not test physical characteristic of a design, event driven simulator may be used in conjunction. Cycle based simulators and emulators may have long compilation. Hence, not suitable for initial tests that needs many changes. Event driven and cycle based simulators have fairly equal debugging environments, all signals are available at all times. Emulators on the other hand, require the list of signals to be traced to be declared at compilation time

22 Strategy If the next problem can be found in a few microseconds of simulated time, then slower simulators with faster compilation times are appropriate. If the current batch of problems all take a couple hundred milliseconds, or even seconds of simulated time, then the startup overhead of cycle based simulation or even an emulator is worth the gain in run time speed. How about the portability of test benches?

23 Processor Models Bus Functional Model (BFM) Instruction-Set Simulator (ISS)

24 Bus Functional Model (BFM) Encapsulates the bus functionality of a processor –Can execute bus transactions on the processor bus (with cycle accuracy) –Cannot execute any instructions Hence, –BFM is an abstract model of processor that can be used to verify how a processor interacts with its peripherals

25 Bus Functional Model (cont’d) SW HW SW C/C++BFM At early stages of the design In the later stages of the design SW HW SW ISSBFM Assembly

26 Instruction-Set Simulator ISS: a processor model capable of simulating execution of instructions Different types of ISS for different purposes –Usage 1: Verification of applications written in assembly-code For fastest speed: translate target assembly instructions into host processor instructions –Is not cycle-accurate. Specially for pipelined and superscalar architectures

27 ISS (cont’d) Different types of ISS … (cont’d) –Usage 2: Verification of timing and interface between system components Used in conjunction with a BFM ISS should be timing-accurate in this usage –ISS often works as an emulator –For performance estimation usage, ISS is to provide accurate cycle-counting –To have certain speed improvements, ISS should provide necessary hooks (discussed later)

28 Integrating an ISS and a BFM ISS + BFM => complete processor model Cycle-accurate ISS + (already cycle-accurate) BFM => cycle-accurate processor model Typical units of an ISS –Fetch, Decode, Execute –Execute unit performs calls to BFM to access memory or configuration registers –Fetch unit performs calls to BFM to read instructions

29 Integrating an ISS and a BFM (cont’d) For more complex architectures (pipelined, superscalar) –Other units must be modeled Cache, prefetch, re-order buffer, issue, … Many units may need to call BFM functions ISS may need to provide BFM with certain memory-access functions (discussed later)

30 Techniques to speedup simulation Reduce activity on memory bus –Most applications: 95% of memory traffic is attributed to instruction and data fetches –Memory access previously verified? => no need to simulate it again during co-simulation Put instruction memory (and/or data memory) inside ISS What to do for external devices accessing instr/data memory? –BFM must be configured to recognize them and call corresponding ISS method to access instr/data –ISS must provide the above methods –ISS must implement a memory map, where certain addresses are directly accessed, while others through bus cycles

31 Techniques to speedup simulation (cont’d) Turn off clocks on modules –All clocked components activated by clock edge Most of time the component is not addressed => activation and simulation (even a limited part of each process) is wasteful => turn off clocks when not necessary –How to do it? BFM generates bus clock only when devices on the bus are addressed