Methods for Evaluation of Embedded Systems Simon Künzli, Alex Maxiaguine Institute TIK, ETH Zurich.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Design Space Exploration of Embedded Systems © Lothar Thiele ETH Zurich.
Using emulation for RTL performance verification
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Transaction Level Modeling with SystemC Adviser :陳少傑 教授 Member :王啟欣 P Member :陳嘉雄 R Member :林振民 P
Lab Meeting Performance Analysis of Distributed Embedded Systems Lothar Thiele and Ernesto Wandeler Presented by Alex Cameron 17 th August, 2012.
Mahapatra-Texas A&M-Fall'001 Cosimulation II Cosimulation Approaches.
1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Performance Analysis of Embedded Systems Lothar Thiele ETH Zurich.
Hybrid Approach to System-Level Performance Analysis Simon Künzli Bologna, May 22, 2006 contains joint work with Francesco Poletti, Luca Benini, and Lothar.
NoC Modeling Networks-on-Chips seminar May, 2008 Anton Lavro.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Institut für Datentechnik und Kommunikationetze Analysis of Shared Coprocessor Accesses in MPSoCs Overview Bologna, Simon Schliecker Matthias.
Parameterized Systems-on-a-Chip Frank Vahid Tony Givargis, Roman Lysecky, Leslie Tauro, Susan Cotterell Department of Computer Science and Engineering.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
1 EE249 Discussion A Method for Architecture Exploration for Heterogeneous Signal Processing Systems Sam Williams EE249 Discussion Section October 15,
Transaction Level Modeling Definitions and Approximations Trevor Meyerowitz EE290A Presentation May 12, 2005.
Dipartimento di Informatica - Università di Verona Networked Embedded Systems The HW/SW/Network Cosimulation-based Design Flow Introduction Transaction.
ARTIST2 Network of Excellence on Embedded Systems Design cluster meeting –Bologna, May 22 nd, 2006 System Modelling Infrastructure Activity leader : Jan.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
Tony GivargisUniversity of California, Riverside & NEC USA1 Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design Tony D. Givargis.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
Interface-based Design Donald Chai EE249. Outline Orthogonalization of concerns Formalisms Interface-based Design Example Cheetah Simulator Future Inroads.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Influence of different system abstractions on the performance analysis.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Internal Design Representations for Embedded System Design Lothar.
On-Chip Communication Architectures Models for Performance Exploration ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 4 1© 2008 Sudeep.
1 Lecture 20: I/O n I/O hardware n I/O structure n communication with controllers n device interrupts n device drivers n streams.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
Extreme Makeover for EDA Industry
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Parameterized Embedded Systems Platforms Frank Vahid Students: Tony Givargis, Roman Lysecky, Susan Cotterell Dept. of Computer Science and Engineering.
SystemC and Levels of System Abstraction: Part I.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
A Methodology for Architecture Exploration of heterogeneous Signal Processing Systems Paul Lieverse, Pieter van der Wolf, Ed Deprettere, Kees Vissers.
Leiden Embedded Research Center Prof. Dr. Ed F. Deprettere, Dr. Bart Kienhuis, Dr. Todor Stefanov Leiden Embedded Research Center (LERC) Leiden Institute.
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
Fast Simulation Techniques for Design Space Exploration Daniel Knorreck, Ludovic Apvrille, Renaud Pacalet
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
Hardware-software Interface Xiaofeng Fan
Winter-Spring 2001Codesign of Embedded Systems1 Methodology for HW/SW Co-verification in SystemC Part of HW/SW Codesign of Embedded Systems Course (CE.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
Presentation by Tom Hummel OverSoC: A Framework for the Exploration of RTOS for RSoC Platforms.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov.
Time Abstraction in Simulation-Based Hardware Verification Alexander Kamkin Institute for System Programming of the Russian Academy of.
Goals in Wearable Computing Jan Beutel, Michael Eisenring, Marco Platzner, Christian Plessl, Lothar Thiele Computer Engineering and Networks Lab Swiss.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
System-on-Chip Design Homework Solutions
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Creation and Utilization of a Virtual Platform for Embedded Software Optimization: An Industrial Case Study Sungpack Hong, Sungjoo Yoo, Sheayun Lee, Sangwoo.
Learning-Based Power Modeling of System-Level Black-Box IPs Dongwook Lee, Taemin Kim, Kyungtae Han, Yatin Hoskote, Lizy K. John, Andreas Gerstlauer.
System-on-Chip Design Homework Solutions
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt
CoCentirc System Studio (CCSS) by
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Leonie Ahrendts, Sophie Quinton, Thomas Boroske, Rolf Ernst
Presentation transcript:

Methods for Evaluation of Embedded Systems Simon Künzli, Alex Maxiaguine Institute TIK, ETH Zurich

System-Level Analysis RISC DSP LookUp Cipher IP Telephony Secure FTP Multimedia streaming Web browsing Memory ? Clock Rate ? Bus Load ? Packet Delays ? Resource Utilization ?

Problems for Performance Estimation RISC DSP SDRAM Arbiter Distributed processing of applications on different resources Interaction of different applications on different resources Heterogeneity, HW-SW

Complex run-time interdependencies  Prof. Ernst, TU Braunschweig M2M2 IP 2 M3M3 M1M1 Com Netw DSP IP 1 HW CPU Sens run-time dependencies of independent components via communication influence on timing and power

A “nice-to-have” performance model measuring what we want high accuracy high speed full coverage based on unified formal specification model composability & parameterization reusable across different abstraction levels  at least easy to refine

Overview over Existing Approaches speed accuracy Thiele Ernst Givargis Lahiri Benini RTL SPADE Jerraya

Discrete-event Simulation System Model Architecture and Behavior Components/Actors/Processes Communication channels/Signals Event Scheduler Event queue © The MathWorks future events (e.g. signal changes) actions to be executed Accuracy vs. Speed: How many events are simulated?

Discrete-event Simulation “The design space”:  Time resolution  Modeling communication  Modeling timing of data-dependent execution  …

Time Resolution x(t) t t2t2 t1t1 t3t3 t5t5 t4t4 t6t6 t7t7 t t2t2 t1t1 t3t3 t5t5 t4t4 t6t6 t7t7 discrete time cont. time a a c a c a a a a c a c a a accuracy Continuous time  e.g. Gate-level simulation Discrete time or “cycle-accurate”  e.g. Register Transfer Level (RTL) simulation  system-level performance analysis

Modeling communication Pin-level model  all signals are modeled explicitly  often combined with RTL Transaction-level Model  protocol details are abstracted  e.g. burst mode transfers TLM simulator of AMBA bus x100 faster then pin-level model Caldari et al. Transaction-Level Models for AMBA Bus Architecture Using SystemC 2.0. DATE 2003 C1C2 ready d0 d1 d2 C1C2 transaction true/false

Modeling timing of data-dependent execution Problem: How to model timing of data- dependent functionality inside a component? Possible solution: Estimate and annotate delays in the functional/behavioral model: a=read(in) a > b task1() write(out,c) task2() inout d2d2 d1d1 a=read(in); if(a>b) { task1(); delay(d1); else { task2(); delay(d2);} write(out,c); this approach works well for HW but may be too coarse for modeling SW

HW/SW Cosimulation Options Application SW... … is delay-annotated & natively executes on workstation as a part of HW simulator … is compiled for target processor and its code is used as a stimuli to processor model that is a part of HW simulator … is not a part of the HW simulator -- a complete separation of Application and Architecture models

Processor Models: Simulation Environment HW Sim. (rest of the system) Processor Model wrapper RTL Microarch. Sim. ISS C/C++ Application SW Compiler.exe prog. code

Processor Models RTL model  cycle-accurate or continuous time  all the details are modeled (e.g. synthesizable) Microarchitecture Simulator  cycle-accurate model  models pipeline effects, etc  can be generated automatically (e.g. Liberty, LISA…) Instruction Set Simulator  provides instruction count  functional models of instructions e.g. SimpleScalar

Multiprocessor System Simulator  L Benini, U Bologna SystemC model Cycle-accurate ISS SystemC Wrapper

Comparison of HW/SW Co-simulation techniques simulatorspeed (instructions/sec) continuous time (nano-second accurate) cycle-accurate50 – 1000 instruction level2000 – 20,000 J. Rowson, Hardware/Software Co-Simulation, Proceedings of the 31st DAC, USA,1994

HW/SW Co-simulation Options Application SW... … is delay-annotated & natively executes on workstation as a part of HW simulator … is compiled for target processor and its code is used as a stimuli to processor model that is a part of HW simulator … is not a part of the HW simulator -- a complete separation of Application and Architecture models

Independent Application and Architecture Models (“Separation of Concerns”) RISC DSP SRAM Application Architecture Mapping WORKLOAD RESOURCES

Co-simulation of Application and Architecture Models Basic principle:  Application (or functional) simulator drives architecture (or hardware) simulator  The models interact via traces of actions  The traces are produced on-line or off-line Advantages:  system-level view  flexible choice of abstraction level  the models and the mapping can be easily altered

Trace-driven Simulation SPADE: System level Performance Analysis and Design space Exploration Application model Architecture model  P. Lieverse et al., U Delft & Philips

Trace-driven Simulation (SPADE)  Lieverse et al., U Delft & Philips

Going away from discrete-event simulation… Analysis for Communication Systems Lahiri et al., UC San Diego A two-step approach: 1.simulation without communication (e.g. using ISS) 2.analysis for different communication architectures  K. Lahiri, UCSD

Overview  K. Lahiri, UCSD

Analytical Methods for Power Estimation Givargis et al. UC Riverside Analytical models for power consumption of :  Caches  Buses two-step approach for fast power evaluation  collect intermediate data using simulation  use equations to rapidly predict power  couple with a fast bus estimation approach

Approach Overview  Givargis, UC Riverside Bus equation: m items/second (denotes the traffic N on the bus) n bits/item k bit wide bus bus-invert encoding random data assumption

Experiment Setup  Givargis, UC Riverside C Program Trace Generator Cache Simulator CPU Power ISS Performance + Power Memory Power Bus Simulator I/D Cache Power Dinero [Edler, Hill] CPU power [Tiwari96]

Analytical Method scheduling discipline 1 e1e1 e2e2 CPU 1 scheduling discipline 2 e3e3 e4e4 CPU 2 ? ? Workload ?

periodic with jitter JJJ TT periodic with burst T b t b t periodic TT sporadic xtxtxtxtxtxt Event Model Interface Classification  Ernst, TU Braunschweig jitter = 0burst length (b) = 1 t = T - J t = T t = t lossless EMIF EMIF to less expressive model T=T, t=T, b=1 T=T, J=0

Example: EMIFs & EAFs scheduling discipline 1 e1e1 e2e2 CPU 1 scheduling discipline 2 e3e3 e4e4 CPU 2 ? ? EMIF EAF Event model interface needed Event adaptation function needed Use standard scheduling analysis for single components.

Using EMIFs and EAFs  Ernst, TU Braunschweig Sporadic Periodic with BurstPeriodic with Jitter Periodic EAF buffer required upper bound only

General Framework Functional Task Model Abstract Task Model Architecture Model Abstract Components (Run-Time Environment) T1T2T3 ARM9DSP Abstract Architecture load scenarios resource units mapping relations functiona l units event stream s abstract resource units abstract functional units abstract event streams abstract load scenarios

max: 2 packets min: 0 packets max: 3 packets min: 1 packet uu ll   Event & Resource Models use arrival curves to capture event streams use service curves to capture processing capacity time t max: 1 packet min: 0 packets  012 # of packets 1 2 3

Analysis for a Single Component

Analysis – Bounds on Delay & Memory  u,l  u,l delay d backlog b service curve  l arrival curve  u b

Comparison between diff. Approaches Simulation-Based can answer virtually any questions about performance can model arbitrary complex systems average case (single instance) time-consuming accurate Analytical Methods possibilities to answer questions limited by method restricted by underlying models good coverage (worst case) fast coarse

Example: IBM Network Processor

Comparison RTC vs. Simulation

Experiment Results  Givargis, UC Riverside Diesel application’s performance Blue is obtained using full simulation Red is obtained using our equations 4% error 320x faster

Concluding Remarks

Backup

Metropolis Framework  Cadence Berkeley Lab & UC Berkeley