1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

November 23, 2005 Egor Bondarev, Michel Chaudron, Peter de With Scenario-based PA Method for Dynamic Component-Based Systems Egor Bondarev, Michel Chaudron,
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
System-level Trade-off of Networks-on-Chip Architecture Choices Network-on-Chip System-on-Chip Group, CSE-IMM, DTU.
Performance of Cache Memory
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
Lecture Objectives: 1)Explain the limitations of flash memory. 2)Define wear leveling. 3)Define the term IO Transaction 4)Define the terms synchronous.
Reference: Message Passing Fundamentals.
System design-related Optimization problems Michela Milano Joint work DEIS Università di Bologna Dip. Ingegneria Università di Ferrara STI Università di.
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Chapter 17 Parallel Processing.
Ritu Varma Roshanak Roshandel Manu Prasanna
Chapter 13 Embedded Systems
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
Figure 1.1 Interaction between applications and the operating system.
ARTIST2 Network of Excellence on Embedded Systems Design cluster meeting –Bologna, May 22 nd, 2006 System Modelling Infrastructure Activity leader : Jan.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
GCSE Computing - The CPU
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.
CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Configurable, reconfigurable, and run-time reconfigurable computing.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Fast Simulation Techniques for Design Space Exploration Daniel Knorreck, Ludovic Apvrille, Renaud Pacalet
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Presentation by Tom Hummel OverSoC: A Framework for the Exploration of RTOS for RSoC Platforms.
A Systematic Approach to the Design of Distributed Wearable Systems Urs Anliker, Jan Beutel, Matthias Dyer, Rolf Enzler, Paul Lukowicz Computer Engineering.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Overview of Operating Systems Introduction to Operating Systems: Module 0.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
1 of 14 1/34 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
The Central Processing Unit (CPU)
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
Jamie Unger-Fink John David Eriksen.  Allocation and Scheduling Problem  Better MPSoC optimization tool needed  IP and CP alone not good enough  Communication.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Computer Engg, IIT(BHU)
How do we evaluate computer architectures?
Operating Systems (CS 340 D)
Parallel Algorithm Design
Improving java performance using Dynamic Method Migration on FPGAs
Operating Systems (CS 340 D)
Parallel Programming in C with MPI and OpenMP
CPU Central Processing Unit
Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt
CPU Central Processing Unit
A High Performance SoC: PkunityTM
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Presentation transcript:

1 of 14 Lab 2: Formal verification with UPPAAL

2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to the others. They are very desperate to tell their secrets and hear what the other persons know. They communicate via a two- way channel (e.g., a telephone); thus, when two persons are on the phone, they exchange the secrets they currently know. What is the minimum telephone calls needed so that each person knows all secrets?  A person should be able to call and answer (channels)  How do we represent secrets? (integer variable)  How do we communicate secrets? (global variable)

3 of 14 Lab 3: Design space exploration with MPARM

4 of 14 4 Outline  The MPARM simulation framework  HW platform  SW platform  Design space exploration for energy minimization  Communication in MPSoC  Mapping and scheduling

5 of 14 5 MPSoC architecture Bus ARM Interrupt Device Private Memory Private Memory Private Memory Semaphore Device Shared Memory CACHE

6 of 14 6 System design flow Informal specification, constraints Modeling System model Mapped and scheduled model Estimation System architecture Architecture selection Hardware and Software Implementation Prototype Fabrication ok Testing Functional simulation not ok Mapping Scheduling not ok

7 of 14 7 System design flow Hardware platform Software Application(s) Extract Task Graph Extract Task Parameters Optimize (Mapping & Sched) Implement Formal Simulation

8 of 14 8 MPARM: Hardware  Simulation platform  Up to 8 ARM7 processors  Variable frequency (dynamic and static)  Split/Unified instruction and data caches  Private memory for each processor  Scratchpad  Shared Memory  Bus: AMBA, ST  This platform can be fine tuned for a given application  Read more in:  /home/TDTS07/sw/mparm/MPARM/doc/simulator_stati stics.txt

9 of 14 9 MPARM: Software  C crosscompiler chain for building the software applications  No operating system  Small set of functions (pr, shared_alloc, WAIT/SIGNAL)  (look in the application code)

10 of MPARM: Why?  Cycle accurate simulation of the system  Various interesting statistics: number of clock cycles executed, bus utilization, cache efficiency, energy/power consumption of the components (CPU cores, bus, memories)

11 of MPARM: How?  mpsim.x -c2 runs a simulation on 2 processors, collecting the default statistics  mpsim.x -c2 -w runs a simulation with power/energy statistics  mpsim.x -c1 --is=9 --ds=10 :one processor with instruction cache of size 512 and data cache of 1024 bytes  mpsim.x -c2 -F0,2 -F1,1 -F3,3 :two processors running at 100 MHz, 200 MHz and the bus running at 66 MHz  200 MHz is the ”default” frequency  mpsim.x -h for the rest  Simulation results are in the file stats.txt

12 of Design space exploration  Generic term for system optimization  Platform optimization:  Select the number of processors  Select the speed of each processor  Select the type, associativity and size of the cache  Select the bus type  Application optimization  Select the interprocessor communication style (shared memory or distributed message passing)  Select the best mapping and schedule

13 of  Given a GSM codec  Running on 1 ARM7 processor  The variables are:  cache parameters  processor frequency  Using the MPARM simulator, find a hardware configuration that minimizes the energy of the system Assignment 1

14 of Energy/speed tradeoff CPU model

15 of Frequency selection: ARM core energy Energy [mJ] Freq. divider

16 of Frequency selection: Total energy Energy [mJ] Freq. divider

17 of Instruction cache size: Total energy log2(CacheSize) Energy [mJ] 2 9 =512 bytes 2 14 =16 kbytes

18 of Instruction cache size: Execution time t [cycles] 5e e+07 6e e+07 7e e+07 8e e+07 9e e+07 1e log2(CacheSize)

19 of Interprocessor data communication CPU 1 CPU 2... a =1... print a ;... BUS How?

20 of Shared memory CPU 1 CPU 2... a =1... print a ; BUS Shared Mem a a =2 a=? Synchronization

21 of Synchronization CPU 1 CPU 2 a =1 signal(sem_a) print a ; BUS Shared Mem a a =2 Semaphore sem_a wait(sem_a) a =2 With semaphores

22 of Synchronization internals (1) CPU 1 CPU 2 a =1 signal(sem_a) print a ; BUS Shared Mem a a =2 Semaphore sem_a while (sem_a==0) wait(sem_a) polling sem_a=1

23 of Synchronization internals (2)  Disadvantages of polling:  Results in higher power consumption (energy from the battery)  Larger execution time of the application  Blocking important communication on the bus

24 of Distributed message passing  Instead:  Direct CPU-CPU communication with distributed semaphores  Each CPU has its own scratchpad  Smaller and faster than a RAM  Smaller energy consumption than a cache  Put frequently used variables on the scratchpad  Cache controlled by hardware (cache lines, hits/misses, …)  Scratchpad controlled by software (e.g., compiler)  Semaphores allocated on scratchpads  No polling

25 of Distributed message passing (1) CPU 1 a=1 signal(sem_a) BUS Shared Mem a CPU 2 print a; a=2 wait(sem_a) sem_a

26 of Distributed message passing (2) CPU 1 (prod) signal(sem_a) BUS CPU 2 (cons) print a; wait(sem_a) sem_a a=1

27 of Assignment 2  You are given 2 implementations of the GSM codec  Shared memory  Distributed message passing  Simulate and compare these 2 approaches  Energy  Runtime

28 of System design flow Hardware platform Software Application(s) Extract Task Graph Extract Task Parameters Optimize (Mapping & Sched) Implement Formal Simulation

29 of Task graph extraction example for (i=0;i<100;i++)a[i]=1;//TASK 1 for (i=0;i<100;i++) b[i]=1;//TASK 2 for (i=0;i<100;i++) c[i]=a[i]+b[i];//TASK 3    Task 1 and 2 can be executed in parallel Task 3 has data dependency on 1 and 2

30 of Execution time extraction  Using the simulator  This is an ”average” execution time  Can be extracted using the dump_light_metric() in MPARM

31 of Execution time extraction example (1) start_metric(); for (i=0;i<100;i++) a[i]=1;//TASK 1 dump_light_metric(); for (i=0;i<100;i++) b[i]=1;//TASK 2 dump_light_metric(); stop_metric(); stop_simulation();

32 of Execution time extraction example (2) Task 1 Interconnect statistics Overall exec time = 287 system cycles (1435 ns) Task NC = CPU average exec time = 0 system cycles (0 ns) Concurrent exec time = 287 system cycles (1435 ns) Bus busy = 144 system cycles (50.17% of 287) Bus transferring data = 64 system cycles (22.30% of 287, 44.44% of 144) Task 2 Interconnect statistics Overall exec time = 5554 system cycles (27770 ns) Task NC = CPU average exec time = 0 system cycles (0 ns) Concurrent exec time = 5554 system cycles (27770 ns) Bus busy = 813 system cycles (14.64% of 5554) Bus transferring data = 323 system cycles (5.82% of 5554, 39.73% of 813)

33 of Application mapping and scheduling CPU0 CPU1 CPU2        0-2  2-4  4-5  1-3  3-5 CPU0 CPU1 CPU2 Bus = + =      dl=3 dl=6 dl=9 

34 of Mapping in MPARM example if (get_proc_id()==1) { //Task 1 executed on processor 1 for (i=1;i<100;i++) a[i]=1; } if (get_proc_id()==2) { //Task 1 executed on processor 2 for (i=1;i<100;i++) b[i]=1; }  Using the lightweight API of MPARM  get_proc_id()

35 of Scheduling in MPARM  The schedule is given by the code sequence executed on one processor //task 1 for(i=1;i<100;i++)a[i]=1; //task 2 for(i=1;i<100;i++)b[i]=3; //task 2 for(i=1;i<100;i++)b[i]=3; //task 1 for(i=1;i<100;i++)a[i]=1; Schedule 1 Schedule 2

36 of Assignment 3: Mapping and scheduling  Theoretical exercise (you will not use MPARM)  Task graph and task execution times are given  Construct 2 schedules with (same) minimal length:  keeping same task mapping  changing the mapping

37 of Thank you! Questions?