Synthesis of Embedded Software for Reactive Systems Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Joint work with: Robert Clarisó, Alex.

Slides:

Advertisements

Similar presentations

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

Advertisements

A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.

Timed Automata.

Requirements on the Execution of Kahn Process Networks Marc Geilen and Twan Basten 11 April 2003 /e.

DATAFLOW PROCESS NETWORKS Edward A. Lee Thomas M. Parks.

1Outline u Part 3: Models of Computation s FSMs s Discrete Event Systems s CFSMs s Data Flow Models s Petri Nets s The Tagged Signal Model.

Synthesis of Embedded Software Using Free-Choice Petri Nets.

11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.

Hardware and Petri nets: application to asynchronous circuit design Jordi CortadellaUniversitat Politècnica de Catalunya, Spain Michael KishinevskyIntel.

1 Discrete Event u Explicit notion of time (global order…) u DE simulator maintains a global event queue (Verilog and VHDL) u Drawbacks s global event.

SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.

A denotational framework for comparing models of computation Daniele Gasperini.

Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.

1 Carnegie Mellon UniversitySPINFlavio Lerda SPIN An explicit state model checker.

Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.

Copyright © 2001 Stephen A. Edwards All rights reserved Dataflow Languages Prof. Stephen A. Edwards.

A Schedulability-Preserving Transformation of BDF to Petri Nets Cong Liu EECS 290n Class Project December 10, 2004.

Review of “Embedded Software” by E.A. Lee Katherine Barrow Vladimir Jakobac.

FunState – An Internal Design Representation for Codesign A model that enables representations of different types of system components. Mixture of functional.

CS 267 Spring 2008 Horst Simon UC Berkeley May 15, 2008 Code Generation Framework for Process Network Models onto Parallel Platforms Man-Kit Leung, Isaac.

1 Quasi-Static Scheduling of Embedded Software Using Free-Choice Petri Nets Marco Sgroi, Alberto Sangiovanni-Vincentelli Luciano Lavagno University of.

Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.

Dataflow Process Networks Lee & Parks Synchronous Dataflow Lee & Messerschmitt Abhijit Davare Nathan Kitchen.

Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.

Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.

Models of Computation for Embedded System Design Alvise Bonivento.

1 Petri Nets Marco Sgroi EE249 - Fall 2002.

A Denotational Semantics For Dataflow with Firing Edward A. Lee Jike Chong Wei Zheng Paper Discussion for.

1 Petri Nets Marco Sgroi EE249 - Fall 2001 Most slides borrowed from Luciano Lavagno’s lecture ee249 (1998)

Department of Electrical Engineering and Computer Sciences University of California at Berkeley Concurrent Component Patterns, Models of Computation, and.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.

A Platform-based Design Flow for Kahn Process Networks Abhijit Davare Qi Zhu December 10, 2004.

CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing.

VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.

Asynchronous Circuit Verification and Synthesis with Petri Nets J. Cortadella Universitat Politècnica de Catalunya, Barcelona Thanks to: Michael Kishinevsky.

Environment Modeling in Quasi- Static Scheduling EE249 Project Donald Chai Mentors: Alex Kondratyev, Yoshi Watanabe.

- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Actual design flows and tools.

272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.

1 Correct and efficient implementations of synchronous models on asynchronous execution platforms Stavros Tripakis UC Berkeley and Verimag EC^2 Workshop,

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”

Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.

Voicu Groza, 2008 SITE, HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Hardware/Software Codesign of Embedded Systems Voicu Groza SITE Hall, Room.

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

Time-Memory Scheduling and Code Generation of Real-Time Embedded Software Chuen-Hau Gau and Pao-Ann Hsiung National Chung Cheng University Chiayi, Taiwan,

Unit Testing 101 Black Box v. White Box. Definition of V&V Verification - is the product correct Validation - is it the correct product.

C. André, J. Boucaron, A. Coadou, J. DeAntoni,

Petri Nets Lecturer: Roohollah Abdipour. Agenda Introduction Petri Net Modelling with Petri Net Analysis of Petri net 2.

Modeling and Analysis of Printer Data Paths using Synchronous Data Flow Graphs in Octopus Ashwini Moily Under the supervision of Dr. Lou Somers, Prof.

Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.

Constraints Assisted Modeling and Validation Presented in CS294-5 (Spring 2007) Thomas Huining Feng Based on: [1]Constraints Assisted Modeling and Validation.

CSCI1600: Embedded and Real Time Software Lecture 11: Modeling IV: Concurrency Steven Reiss, Fall 2015.

Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.

Royal Institute of Technology System Specification Fundamentals Axel Jantsch, Royal Institute of Technology Stockholm, Sweden.

High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 3: Embedded Computing High Performance Embedded Computing Wayne Wolf.

High Performance Embedded Computing © 2007 Elsevier Lecture 4: Models of Computation Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

Marilyn Wolf1 With contributions from:

SS 2017 Software Verification Bounded Model Checking, Outlook

Asynchronous Interface Specification, Analysis and Synthesis

The Dataflow Interchange Format (DIF): A Framework for Specifying, Analyzing, and Integrating Dataflow Representations of Signal Processing Systems Shuvra.

Parallel Programming By J. H. Wang May 2, 2017.

Clockless Computing COMP

Parallel Algorithm Design

Discrete Event Explicit notion of time (global order…)

Quasi-static Scheduling for Reactive Systems

An explicit state model checker

Deadlock Detection for Distributed Process Networks

Presentation transcript:

Synthesis of Embedded Software for Reactive Systems Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Joint work with: Robert Clarisó, Alex Kondratyev, Luciano Lavagno, Claudio Passerone and Yosinori Watanabe (UPC, Cadence Berkeley Labs, Politecnico di Torino)

System Design External IP provider (e.g. software modem) Internal IP provider (e.g. MPEG2 engine) System designer (Set-top box) Contents provider (e.g. TV broadcast company) Methodology for Platform-based System Design Platform provider (e.g. Semiconductor company) µP DSP Coms MPEG2 Engine custom Graphics : Requirements specification, Testbench : Functional and performance models (with agreed interfaces and abstraction levels)

etropolis Metropolis Project Goal: develop a formal design environment –Design methodologies: abstraction levels, design problem formulations –EDA: formal methods for automatic synthesis and verification, a modeling mechanism: heterogeneous semantics, concurrency Participants: –UC Berkeley (USA): methodologies, modeling, formal methods –CMU (USA): formal methods –Politecnico di Torino (Italy): modeling, formal methods –Universitat Politècnica de Catalunya (Spain): modeling, formal methods –Cadence Berkeley Labs (USA): methodologies, modeling, formal methods –Philips (Netherlands): methodologies (multi-media) –Nokia (USA, Finland): methodologies (wireless communication) –BWRC (USA): methodologies (wireless communication) –BMW (USA): methodologies (fault-tolerant automotive controls) –Intel (USA): methodologies (microprocessors)

Metropolis Framework Design Constraints Function Specification Architecture Specification Metropolis Infrastructure Design methodology Meta model of computation Base tools - Design imports - Meta model compiler - Simulation Metropolis Formal Methods: Synthesis/Refinement Metropolis Formal Methods: Analysis/Verification

Outline The problem –Synthesis of concurrent specifications for sequential processors –Compiler optimizations across processes Previous work: Dataflow networks –Static scheduling of SDF networks –Code and data size optimization Quasi-Static Scheduling of process networks –Petri net representation of process networks –Scheduling and code generation Open problems

Embedded Software Synthesis Specification: concurrent functional netlist (Kahn processes, dataflow actors, SDL processes, …) Software implementation: (smaller) set of concurrent software tasks Two sub-problems: –Generate code for each task –Schedule tasks dynamically Goals: –minimize real-time scheduling overhead –maximize effectiveness of compilation

Environmental controller ACDehumidifierAlarm Temperature Humidity ENVIRONMENTAL CONTROLLER TEMP FILTER HUMIDITY FILTER CONTROLLER TSENSORHSENSOR HDATATDATA AC-onDRYER-onALARM-on

Environmental controller TEMP FILTER HUMIDITY FILTER CONTROLLER TSENSORHSENSOR HDATATDATA AC-onDRYER-onALARM-on TEMP-FILTER float sample, last; last = 0; forever { sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; WRITE(TDATA, sample); }

Environmental controller TEMP FILTER HUMIDITY FILTER CONTROLLER TSENSORHSENSOR HDATATDATA AC-onDRYER-onALARM-on TEMP-FILTER float sample, last; last = 0; forever { sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; WRITE(TDATA, sample); } HUMIDITY-FILTER float h, max; forever { h = READ(HSENSOR); if (h > MAX) WRITE(HDATA, h); }

Environmental controller TEMP FILTER HUMIDITY FILTER CONTROLLER TSENSORHSENSOR HDATATDATA AC-onDRYER-onALARM-on CONTROLLER float tdata, hdata; forever { select(TDATA,HDATA) { case TDATA: tdata = READ(TDATA); if (tdata > TFIRE) WRITE(ALARM-on,10); else if (tdata > TMAX) WRITE(AC-on, tdata-TMAX); case HDATA: hdata = READ(HDATA); if (hdata > HMAX) WRITE(DRYER-on, 5); }

TEMP FILTER HUMIDITY FILTER CONTROLLER TSENSORHSENSOR HDATATDATA AC-onDRYER-onALARM-on Tsensor T-FILTER wakes up T-FILTER executes T-FILTER sleeps Hsensor H-FILTER wakes up H-FILTER executes & sends data to HDATA H-FILTER sleeps CONTROLLER wakes up CONTROLLER executes & reads data from HDATA Environ.ProcessesOS Operating system

Compiler optimizations Instruction level Basic blocks Intra-procedural (across basic blocks) Inter-procedural Inter-process ? a = b*16  a = b >> 4 common subexpr., copy propagation loop invariants, induction variables inline expansion, parameter propagation channel optimizations, OS overhead reduction Each optimization enables further optimizations at lower levels

Partial evaluation (example) Specification: subsets (n,k) = n! / (k! * (n-k)!) ________________________________________________ int subsets (int n, int k) { return fact(n) / (fact(k) * fact(n-k)); } int pairs (int n) { return subsets (n,2);}... print (pairs(x+1)) print (pairs(5))... Partial evaluation (compiler optimizations)

Partial evaluation (example) Specification: subsets (n,k) = n! / (k! * (n-k)!) ________________________________________________ int subsets (int n, int k) { return fact(n) / (fact(k) * fact(n-k)); } int pairs (int n) { return subsets (n,2);}... print ((x+1)*x / 2) print (pairs(5))... Partial evaluation (compiler optimizations)

Partial evaluation (example) Specification: subsets (n,k) = n! / (k! * (n-k)!) ________________________________________________ int subsets (int n, int k) { return fact(n) / (fact(k) * fact(n-k)); } int pairs (int n) { return subsets (n,2);}... print ((x+1)*x / 2) print (10)...

Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (D, 2); } forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! A H n pairs (n)

Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (D, 2); } forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! A H No chances for optimization

Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (D, 2); } forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! A H 2...2

Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (G, 2); } forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! A H 2...2

Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (G, 2); } forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! A H 2...2

Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (G, *); } forever { x = read (E); y = read (F); read (G); write (H, x/(y*2)); } x! A H Copy propagation across processes Channel G only synchronizes (token available)

Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (G, *); } forever { x = read (E); y = read (F); read (G); write (H, x/(y*2)); } x! A H By scheduling operations properly, FIFOs may become variables (one element per FIFO, at most)

Inter-process partial evaluation forever { n = read (A); v1 = n; v3 = n-2; x = v2; y = v4; write (H, x/(y*2)); } x! A H v1v2 v3v4

Inter-process partial evaluation forever { n = read (A); v1 = n; v2 = fact (v1); x = v2; v3 = n-2; v4 = fact (v3); y = v4; write (H, x/(y*2)); } A H And now we can apply conventional compiler optimizations

Inter-process partial evaluation forever { n = read (A); x = fact (n); y = fact (n-2); write (H, x/(y*2)); } A H If some “clever” theorem prover could realize that fact(n) = n*(n-1)*fact(n-2) the following code could be derived...

Inter-process partial evaluation forever { n = read (A); write (H,n*(n-1)/*2); } A H

Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (D, 2); } forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! A H This was the original specification of the system !

Inter-process partial evaluation This is the final implementation after inter-process optimization: Only one process (no context switching overhead) Channels substituted by variables (no communication overhead) forever { n = read (A); write (H,n*(n-1)/*2); } A H

TEMP FILTER HUMIDITY FILTER CONTROLLER TSENSORHSENSOR HDATATDATA AC-onDRYER-onALARM-on Operating system Goal: Goal: improve performance, code size power consumption,... Reduce operating system overhead Reduce communication overhead How?: Do as much as possible statically and automatically Scheduling Compiler optimizations

Outline The problem –Synthesis of concurrent specifications –Compiler optimizations across processes Previous work: Dataflow networks –Static scheduling of SDF networks –Code and data size optimization Quasi-Static Scheduling of process networks –Petri net representation of process networks –Scheduling and code generation Open problems

Dataflow networks Powerful mechanism for data-dominated systems (Often stateless) actors perform computation Unbounded FIFOs perform communication via sequences of tokens carrying values –(matrix of) integer, float, fixed point –image of pixels, ….. Determinacy: –unique output sequences given unique input sequences –Sufficient condition: blocking read (process cannot test input queues for emptiness)

A bit of history Kahn process networks (‘58): formal model Karp computation graphs (‘66): seminal work Dennis Dataflow networks (‘75): programming language for MIT DF machine Lee’s Static Data Flow networks (‘86): efficient static scheduling Several recent implementations (Ptolemy, Khoros, Grape, SPW, COSSAP, SystemStudio, DSPStation, Simulink, …)

Intuitive semantics Example: FIR filter –single input sequence i(n) –single output sequence o(n) –o(n) = c 1 * i(n) + c 2 * i(n-1)  c1 c1 +o i i(-1)  c2 c2

Examples of Dataflow actors SDF: Static Dataflow: fixed number of input and output tokens BDF: Boolean Dataflow control token determines number of consumed and produced tokens FFT merge select TF FT

Static scheduling of DF Key property of DF networks: output sequences do not depend on firing sequence of actors (marked graphs) SDF networks can be statically scheduled at compile-time –execute an actor when it is known to be fireable –no overhead due to sequencing of concurrency –static buffer sizing Different schedules yield different –code size –buffer size –pipeline utilization

Balance equations Number of produced tokens must equal number of consumed tokens on every edge (channel) Repetitions (or firing) vector v of schedule S: number of firings of each actor in S v(A) ·n p = v(B) ·n c must be satisfied for each edge npnp ncnc AB AB npnp ncnc

Balance equations BC A Balance for each edge: – 3 v(A) - v(B) = 0 – v(B) - v(C) = 0 – 2 v(A) - v(C) = 0

Balance equations M ·v = 0 iff S is periodic Full rank (as in this case) no non-zero solution no periodic schedule (too many tokens accumulate on A  B or B  C) M = BC A

Balance equations Non-full rank infinite solutions exist (linear space of dimension 1) Any multiple of v = |1 2 2| T satisfies the balance equations ABCBC and ABBCC are minimal valid schedules ABABBCBCCC is non-minimal valid schedule M = BC A

Static SDF scheduling Main SDF scheduling theorem (Lee ‘86): –A connected SDF graph with n actors has a periodic schedule iff its topology matrix M has rank n-1 –If M has rank n-1 then there exists a unique smallest integer solution v to M v = 0

Deadlock If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) BC A Schedule: (2A) B C Deadlock !

Deadlock If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) BC A Schedule: (2A) B C

Deadlock If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) BC A Schedule: (2A) B C

Deadlock If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) BC A Schedule: (2A) B C

Deadlock If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) BC A Schedule: (2A) B C

Deadlock If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) BC A Schedule: (2A) B C

Compilation optimization Assumption: code stitching (chaining custom code for each actor) More efficient than C compiler for DSP Comparable to hand-coding in some cases Explicit parallelism, no artificial control dependencies Main problem: memory and processor/FU allocation depends on scheduling, and vice- versa

Code size minimization Assumptions (based on DSP architecture): –subroutine calls expensive –fixed iteration loops are cheap (“zero-overhead loops”) Global optimum: single appearance schedule e.g. ABCBC  A (2BC), ABBCC  A (2B) (2C) may or may not exist for an SDF graph… buffer minimization relative to single appearance schedules (Bhattacharyya ‘94, Lauwereins ‘96, Murthy ‘97)

Assumption: no buffer sharing Example: v = | | T Valid SAS: (100 A) (100 B) (10 C) D requires 210 units of buffer area Better (factored) SAS: (10 (10 A) (10 B) C) D requires 30 units of buffer area, but… requires 21 loop initiations per period (instead of 3) Buffer size minimization CD 110 A B 1 1

Scheduling more powerful DF SDF is limited in modeling power More general DF is too powerful –non-Static DF is Turing-complete (Buck ‘93) –bounded-memory scheduling is not always possible Boolean Data Flow: Quasi-Static Scheduling of special “patterns” –if-then-else, repeat-until, do-while Dynamic Data Flow: run-time scheduling –may run out of memory or deadlock at run time Kahn Process Networks: quasi-static scheduling using Petri nets –conservative: schedulable network may be declared unschedulable

Outline The problem –Synthesis of concurrent specifications –Compiler optimizations across processes Previous work: Dataflow networks –Static scheduling of SDF networks –Code and data size optimization Quasi-Static Scheduling of process networks –Petri net representation of process networks –Scheduling and code generation Open problems

Quasi-Static Scheduling Sequentialize concurrent operations as much as possible less communication overhead (run-time task generation) better starting point for compilation (straight-line code from function blocks)  Must handle data-dependent control multi-rate communication QSS

The problem Given: a network of Kahn processes –Kahn process: sequential function + ports –communication: port-based, point-to-point, uni- directional, multi-rate Find: a single sequential task –functionally equivalent to the original network (modulo concurrency) –threads driven by input stimuli (no OS intervention) TEMP FILTER HUMIDITY FILTER CONTROLLER TSENSORHSENSOR HDATATDATA AC-onDRYER-onALARM-on

Init() last = 0; Tsensor() sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; if (sample > TFIRE) WRITE(ALARM-on,10); else if (sample > TMAX) WRITE(AC-on,sample-TMAX); } Hsensor() h = READ(HSENSOR); if (h > MAX) WRITE(DRYER-on,5); Event-driven threads Reset

The scheduling procedure 1. Specify a network of processes –process: C + communication operations –netlist: connection between ports 2. Translate to the computational model: Petri nets 3. Find a “schedule” on the Petri net 4. Translate the schedule to a task

TEMP-FILTER float sample, last; last = 0; while (1) { sample = READ(TSENSOR); if (|sample - last|> DIF) { last = sample; WRITE(TDATA, sample); } TSENSOR sample = READ(TSENSOR) last = sample; WRITE(TDATA,sample) TDATA last = 0 TEMP FILTER TSENSOR TDATA T F

HUMIDITY-FILTER float h, max; last = 0; while (1) { h = READ(HSENSOR); if (h > MAX) WRITE(HDATA, h); } HUMIDITY FILTER HSENSOR HDATA HSENSOR h = READ(HSENSOR) WRITE(HDATA,h) HDATA T F h > MAX ?

HDATA WRITE(ALARM-on,10) T F h > MAX ? TDATA hdata = READ(HDATA)tdata = READ(TDATA) WRITE(AC-on,tdata-TMAX) T WRITE(DRYER-on,5) F tdata > TFIRE? tdata > TMAX? T F hdata > HMAX? CONTROLLER while(1) { select(TDATA,HDATA) { case TDATA: tdata = READ(TDATA); if (tdata > TFIRE) WRITE(ALARM-on, 10); else if (tdata > TMAX) WRITE(AC-on, tdata-TMAX); case HDATA: hdata = READ(HDATA, hdata); if (hdata > HMAX) WRITE(DRYER-on, 5); }}

WRITE(ALARM-on,10) T F h > MAX ? TDATA hdata = READ(HDATA)tdata = READ(TDATA) WRITE(AC-on,tdata-TMAX) T WRITE(DRYER-on,5) F tdata > TFIRE? tdata > TMAX? T F hdata > HMAX? HSENSOR h = READ(HSENSOR) WRITE(HDATA,h) T F TSENSOR sample = READ(TSENSOR) last = sample; WRITE(TDATA,sample) last = 0 |sample-last| > dif ? T F HDATA h > MAX ?

Petri nets for Kahn process networks Sequential processes (1 token per process)Input/Output ports (communication with the environment)Channels (point-to-point communication between processes)

Petri nets for Kahn process networks True False TrueFalse Data-dependent choices Conservative assumption (any outcome is possible)

Schedule Infinite state space Schedule properties: Finite (no infinite resources) Inputs served infinitely often All choice outcomes covered

Schedule Finding the optimal schedule is computationally expensive Heuristics are required token count minimization guidance by T-invariants (cycles)

Code generation I1 I2 system TF T F I1 I2 I1 I2 I1 I2 Initialization Await state Choice Generated code: ISRs driven by input stimuli (I1 and I2) Each tasks contains threads from one await state to another await state

Code generation I1 I2 system TF T F I1 I2 I1 I2 I1 I2 Generated code: ISRs driven by input stimuli (I1 and I2) Each tasks contains threads from one await state to another await state

Code generation I1 I2 system Generated code: ISRs driven by input stimuli (I1 and I2) Each tasks contains threads from one await state to another await state C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 S1 S2 S3 C11 I1 I2 I1 I2 I1 I2 T F

Code generation C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 S1 S2 S3 C11 I1 I2 I1 I2 I1 I2 T F enum state {S1, S2, S3} S;

Code generation enum state {S1, S2, S3} S; Init () { C0(); S = S1; return; } C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 S1 S2 S3 C11 I1 I2 I1 I2 I1 I2 T F

C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 S1 S2 S3 C11 I1 I2 I1 I2 I1 I2 T F Code generation C1 C2 C3 C5 C6 C7 S1 S2 S3 C11 I1 enum state {S1, S2, S3} S; ISR1 () { switch(S) { case S1: C1(); C2(); S=S2; return; case S2: C3(); C2(); return; case S3: C6(); C7(); C11(); C5(); return; }

C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 S1 S2 S3 C11 I1 I2 I1 I2 T F Code generation enum state {S1, S2, S3} S; ISR2 () { switch(S) { case S1: C4(); C5(); S=S3; break; case S2: C10(); C11(); C5(); S=S3; return; case S3: if (C8()) { C7(); C11(); C5(); return; } else { C9(); S = S1; return; } } } C4 C5 C7 C8 C9 C10 S1 S2 S3 C11 I2

Code generation enum state {S1, S2, S3} S; Init () { C0(); S = S1; return; } ISR1 () { switch(S) { case S1: C1(); C2(); S=S2; return; case S2: C3(); C2(); return; case S3: C6(); C7(); C11(); C5(); return; } ISR2 () { switch(S) { case S1: C4(); C5(); S=S3; break; case S2: C10(); C11(); C5(); S=S3; return; case S3: if (C8()) { C7(); C11(); C5(); return; } else { C9(); S = S1; return; } } } C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 S1 S2 S3 C11 I1 I2 I1 I2 I1 I2 T F

Code generation enum state {S1, S2, S3} S; Init () { C0(); S = S1; return; } ISR1 () { switch(S) { case S1: C1(); C2(); S=S2; return; case S2: C3(); C2(); return; case S3: C6(); C7(); C11(); C5(); return; } ISR2 () { switch(S) { case S1: C4(); C5(); S=S3; break; case S2: C10(); C11(); C5(); S=S3; return; case S3: if (C8()) { C7(); C11(); C5(); return; } else { C9(); S = S1; return; } } } Init () ISR1 () ISR2 () Reset I1 I2 S

Environmental controller ACDehumidifierAlarm Temperature Humidity ENVIRONMENTAL CONTROLLER TEMP FILTER HUMIDITY FILTER CONTROLLER TSENSORHSENSOR HDATATDATA AC-onDRYER-onALARM-on

WRITE(ALARM-on,10) T F h > MAX ? TDATA hdata = READ(HDATA)tdata = READ(TDATA) WRITE(AC-on,tdata-TMAX) T WRITE(DRYER-on,5) F tdata > TFIRE? tdata > TMAX? T F hdata > HMAX? HSENSOR h = READ(HSENSOR) WRITE(HDATA,h) T F TSENSOR sample = READ(TSENSOR) last = sample; WRITE(TDATA,sample) last = 0 |sample-last| > dif ? T F HDATA h > MAX ?

Et h > MAX ? TDATA I D F Jt HSENSOR G Hf TSENSOR B Ct A HDATA p1 p2 p3 p4 p5 p6 p7 p8 Ef p0 p9 p10 Cf Jf

(p0 p8 p9) (p1 p8 p9) (p1 p3 p8 p9) (p2 p8 p9) (p1 p8 p9 TDATA) (p1 p4 p8) (p1 p5 p8) (p1 p6 p8 p9) (p2 p7 p9) (p1 p8 p9 HDATA) (p1 p8 p10) A B Ct Cf D Ef Et F G Ht I Hf Jf Jt TSENSOR HSENSOR await state

(p0 p8 p9) (p1 p8 p9) (p1 p3 p8 p9) (p2 p8 p9) (p1 p8 p9 TDATA) (p1 p4 p8) (p1 p5 p8) (p1 p6 p8 p9) (p2 p7 p9) (p1 p8 p9 HDATA) (p1 p8 p10) A B Ct Cf D Ef Et F G Ht I Hf Jf Jt TSENSOR HSENSOR

(p0 p8 p9) (p1 p8 p9) (p1 p3 p8 p9) (p2 p8 p9) (p1 p8 p9 TDATA) (p1 p4 p8) (p1 p5 p8) (p1 p6 p8 p9) (p2 p7 p9) (p1 p8 p9 HDATA) (p1 p8 p10) A B Ct Cf D Ef Et F G Ht I Hf Jf Jt TSENSOR HSENSOR TEMP-FILTER HUMIDITY-FILTER CONTROLLER

Tsensor() { sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; WRITE (TDATA,sample); tdata = READ (TDATA); if (tdata > TFIRE) WRITE(ALARM-on,10); else if (tdata > TMAX) WRITE(AC-on,tdata-TMAX); } Channel elimination Code generation and optimization

Tsensor() { READ(TSENSOR,sample,1); if (|sample - last| > DIF) { last = sample; WRITE (TDATA,sample,1); READ (TDATA,tdata,1); if (tdata > TFIRE) WRITE(ALARM-on,10); else if (tdata > TMAX) WRITE(AC-on,tdata-TMAX); } Copy propagation tdata = sample Code generation and optimization

Tsensor() { READ(TSENSOR,sample,1); if (|sample - last| > DIF) { last = sample; WRITE (TDATA,sample); tdata = READ (TDATA); if (sample > TFIRE) WRITE(ALARM-on,10); else if (sample > TMAX) WRITE(AC-on,sample-TMAX); } Code generation and optimization

Init() last = 0; Tsensor() sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; if (sample > TFIRE) WRITE(ALARM-on,10); else if (sample > TMAX) WRITE(AC-on,sample-TMAX); } Hsensor() h = READ(HSENSOR); if (h > MAX) WRITE(DRYER-on,5); Event-driven threads Reset

Application example: ATM Switch Input cells: accept? Output cells: emit? No static schedule due to: –Inputs with independent rates (need Real-Time dynamic scheduling) –Data-dependent control (can use Quasi-Static Scheduling)

Functional Decomposition 4 Tasks (+ 1 arbiter) Accept/discard cell Clock divider Output time selector Output cell enabler

Minimal (QSS) Decomposition 2 Tasks Input cell processing Output cell processing

Real-time scheduling of tasks + RTOS Shared Processor Task 1 Task 2

ATM: experimental results Sw Implementation QSS Functional partitioning Number of tasks 2 5 Lines of C code Clock cycles 197, , Tasks Functional partitioningQSS

Producer-Filter-Consumer Example controller filter producer consumer init Req Ack Coeff Pixels pixels

Experimental Results # of clock cycles size of channels 4-task implementation 1-task implementation

Open problems Is a system schedulable ? (decidability) False paths in concurrent systems (data dependencies) Synthesis for multi-processors Abstraction / partitioning and many others...

Schedulability A finite complete cycle is a finite sequence of transition firings that returns the net to its initial state: infinite execution bounded memory To find a finite complete cycle we must solve the balance (or characteristic) equation of the Petri net f * D = 0 t1t1 t2t2 t3t3 f = (4,2,1) 22 2 t1t1 t2t2 t3t3  No schedule D = f * D = 0 has no solution

t1t1 t2t2 t3t3 t5t5 t6t6 Schedulability t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t1t1 t2t2 t3t3 t5t5 t6t6 Can the “adversary” ever force token overflow?

Schedulability t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t1t1 t2t2 t3t3 t5t5 t7t7 Can the “adversary” ever force token overflow?

t1t1 t2t2 t4t4 t8t8 Schedulability t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t1t1 t2t2 t4t4 t8t8 Can the “adversary” ever force token overflow?

Schedulability t1t1 t2t2 t3t3 t4t4 t5t5 t7t7 t6t6 Can the “adversary” ever force token overflow?

Schedulability t1t1 t2t2 t3t3 t4t4 t5t5 t7t7 t6t6 Can the “adversary” ever force token overflow?

Schedulability t1t1 t2t2 t3t3 t4t4 t5t5 t7t7 t6t6 Can the “adversary” ever force token overflow?

Schedulability Schedulability of Free-choice PNs is decidable –Algorithm is exponential What if the resulting PN is non-free choice? (synchronization-dependent control) What if the PN is not schedulable for all choice resolutions? (correlation between choices)

(Quasi) Static Scheduling approaches Lee et al. ‘86: Static Data Flow: cannot specify data- dependent control Buck et al. ‘94: Boolean Data Flow: undecidable schedulability check, heuristic pattern-based algorithm Thoen et al. ‘99: Event graph: no schedulability check, no task minimization Lin ‘97: Safe Petri Net: no schedulability check, single- rate, reachability-based algorithm Thiele et al. ‘99: Bounded Petri Net: partial schedulability check, reachability-based algorithm Cortadella et al. ‘00: General Petri Net: maybe undecidable schedulability check, balance equation- based algorithm

t = READ(D) s = s + t j = j + 1 j = 0 s = 0 i = 0 i = i + 1 WRITE(D,data[i]) tdtd tftf tete tbtb tata tctc TF TF p1p1 p5p5 p6p6 p7p7 p4p4 p3p3 p2p2 i<3 j<3 False paths Choices are correlated #WRITES = #READS  i = j

Multi-processor allocation enum state {S1, S2, S3} S; Init () { C0(); S = S1; return; } ISR1 () { switch(S) { case S1: C1(); C2(); S=S2; return; case S2: C3(); C2(); return; case S3: C6(); C7(); C11(); C5(); return; } ISR2 () { switch(S) { case S1: C4(); C5(); S=S3; break; case S2: C10(); C11(); C5(); S=S3; return; case S3: if (C8()) { C7(); C11(); C5(); return; } else { C9(); S = S1; return; } } } Init () ISR1 () ISR2 () Reset I1 I2 S Processor 1 Processor 2 State and data are shared: Mutual exclusion required

Conclusions Reactive systems –OS required to control concurrency –Processes are often reused in different environments Static and Quasi-Static Scheduling minimize run-time overhead by automatic partitioning the system functions into input-driven threads –No context switch required (OS overhead is reduced) –Compiler optimizations across processes Much more research is needed: –strategies to find schedules (decidability ?) –false paths in concurrent systems –what about multiple processors? –...