Concurrent Models of Computation

Slides:



Advertisements
Similar presentations
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Advertisements

DATAFLOW PROCESS NETWORKS Edward A. Lee Thomas M. Parks.
Integrated Design and Analysis Tools for Software-Based Control Systems Shankar Sastry (PI) Tom Henzinger Edward Lee University of California, Berkeley.
Chapter 2: Processes Topics –Processes –Threads –Process Scheduling –Inter Process Communication (IPC) Reference: Operating Systems Design and Implementation.
April 16, 2009 Center for Hybrid and Embedded Software Systems PtidyOS: An Operating System based on the PTIDES Programming.
Introduction to Operating Systems – Windows process and thread management In this lecture we will cover Threads and processes in Windows Thread priority.
Chapter 13 Embedded Systems
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Chapter 11: Distributed Processing Parallel programming Principles of parallel programming languages Concurrent execution –Programming constructs –Guarded.
7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Cyber-Physical Systems: A Vision of the Future Edward A. Lee Robert S. Pepper Distinguished.
Chapter 11 Operating Systems
SEC PI Meeting Annapolis, May 8-9, 2001 Component-Based Design of Embedded Control Systems Edward A. Lee & Jie Liu UC Berkeley with thanks to the entire.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
Designing Predictable and Robust Systems Tom Henzinger UC Berkeley and EPFL.
Advanced Embedded Systems Design Pre-emptive scheduler BAE 5030 Fall 2004 Roshani Jayasekara Biosystems and Agricultural Engineering Oklahoma State University.
CE Operating Systems Lecture 5 Processes. Overview of lecture In this lecture we will be looking at What is a process? Structure of a process Process.
Real-Time Software Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Threads, Thread management & Resource Management.
Hybrid Systems Concurrent Models of Computation EEN 417 Fall 2013.
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
Reference: Ian Sommerville, Chap 15  Systems which monitor and control their environment.  Sometimes associated with hardware devices ◦ Sensors: Collect.
Mahapatra-A&M-Fall'001 Co-design Finite State Machines Many slides of this lecture are borrowed from Margarida Jacome.
Lecture 2, CS52701 The Real Time Computing Environment I CS 5270 Lecture 2.
CS333 Intro to Operating Systems Jonathan Walpole.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 3.
Parallel Computing Presented by Justin Reschke
Real-time Software Design
Interprocess Communication Race Conditions
ECE 720T5 Fall 2012 Cyber-Physical Systems
REAL-TIME OPERATING SYSTEMS
Chapter 19: Real-Time Systems
EMERALDS Landon Cox March 22, 2017.
A Closer Look at Instruction Set Architectures
SOFTWARE DESIGN AND ARCHITECTURE
Multithreaded Programming in Java
Parallel and Distributed Simulation Techniques
Assembly Language for Intel-Based Computers, 5th Edition
ESE532: System-on-a-Chip Architecture
Course Description Algorithms are: Recipes for solving problems.
Real-time Software Design
Introduction to cosynthesis Rabi Mahapatra CSCE617
CSCI1600: Embedded and Real Time Software
Shanna-Shaye Forbes Ben Lickly Man-Kit Leung
GEOMATIKA UNIVERSITY COLLEGE CHAPTER 2 OPERATING SYSTEM PRINCIPLES
Chapter 2: The Linux System Part 3
Operating Systems.
Concurrent Models of Computation
Spring CS 599. Instructor: Jyo Deshmukh
Software models - Software Architecture Design Patterns
Embedded Systems: A Focus on Time
Threads Chapter 4.
Background and Motivation
Multiprocessor and Real-Time Scheduling
Concurrency: Mutual Exclusion and Process Synchronization
Computer Architecture
CSCI1600: Embedded and Real Time Software
Concurrent Models of Computation for Embedded Software
Chapter 19: Real-Time Systems
Chapter 10 Multiprocessor and Real-Time Scheduling
Concurrent Models of Computation
Applying Use Cases (Chapters 25,26)
Applying Use Cases (Chapters 25,26)
CSE 153 Design of Operating Systems Winter 2019
Real time signal processing
CSCI1600: Embedded and Real Time Software
Concurrent Models of Computation
Chapter 13: I/O Systems.
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Concurrent Models of Computation Edward A. Lee Robert S. Pepper Distinguished Professor, UC Berkeley EECS 219D Concurrent Models of Computation Fall 2011 Copyright © 2009-2011, Edward A. Lee, All rights reserved System-Level Design Languages: Orthogonalizing the Issues Presenter: Edward A. Lee, UC Berkeley Abstract: System-level design languages (SLDLs) need to be equally comfortable with hardware, software, and problem-level models. The requirements are too broad to be adequately addressed by any single solution that we would recognize as a “language.” But the prospect of needing (or continuing to need) to use multiple incompatible languages to achieve a single system design is grim. In this talk, I outline an approach to achieving interoperability without a single monolithic standard language. Instead, orthogonal properties of a language are created and standardized separately. Specifically, an SLDL is divided into an abstract syntax, a system of syntactic transformations, a concrete syntax, an actor semantics, a semantics for the interaction of actors,and a type system. Each of these are roughly independent dimensions of a language, and two languages, tools, or frameworks may share identical designs in one or more dimensions. Interoperability can be achieved where practical, but does not become a straightjacket where impractical. Week 11: Time-Triggered Models

The Synchronous Abstraction Has a Drawback “Model time” is discrete: Countable ticks of a clock. WRT model time, computation does not take time. All actors execute “simultaneously” and “instantaneously” (WRT to model time). As a consequence, long-running tasks determine the maximum clock rate of the fastest clock, irrespective of how frequently those tasks must run.

Simple Example: Spectrum Analysis Not time critical path Time critical path How do we keep the non-time critical path from interfering with the time-critical path?

Dataflow Models FIFO buffer Actor A Actor B Buffered communication between concurrent components (actors). Static scheduling: Assign to each thread a sequence of actor invocations (firings) and repeat forever. Dynamic scheduling: Each time dispatch() is called, determine which actor can fire (or is firing) and choose one. May need to implement interlocks in the buffers.

Buffers for Dataflow Unbounded buffers require memory allocation and deallocation schemes. Bounded size buffers can be realized as circular buffers or ring buffers, in a statically allocated array. A read pointer r is an index into the array referring to the first empty location. Increment this after each read. A fill count n is unsigned number telling us how many data items are in the buffer. The next location to write to is (r + n ) modulo buffer length. The buffer is empty if n == 0 The buffer is full if n == buffer length Can implement n as a semaphore, providing mutual exclusion for code that changes n or r.

Abstracted Version of the Spectrum Example: Non-preemptive scheduling Assume infinitely repeated invocations, triggered by availability of data at A. Suppose that C requires 8 data values from A to execute. Suppose further that C takes much longer to execute than A or B. Then a schedule might look like this: …

Uniformly Timed Schedule A preferable schedule would space invocations of A and B uniformly in time, as in: … minimum latency

Non-Concurrent Uniformly Timed Schedule Notice that in this schedule, the rate at which A and B can be invoked is limited by the execution time of C. …

Concurrent Uniformly Timed Schedule: Preemptive schedule With preemption, the rate at which A and B can be invoked is limited only by total computation: thread 1: … high priority thread 2: preemptions … low priority

Ignoring Initial Transients, Abstract to Periodic Tasks sampleTime = 8 sampleTime = 1 sampleTime = 1 In steady-state, the execution follows a simple periodic pattern: This follows the principles of rate-monotonic scheduling (RMS). thread 1: … thread 2: …

Requirement 1 for Determinacy: Periodicity interlock sampleTime: 8 thread 1: … sampleTime: 1 sampleTime: 1 thread 2: … With a fixed-length circular buffer, If the execution of C runs longer than expected, data determinacy requires that thread 1 be delayed accordingly. This can be accomplished with semaphore synchronization. But there are alternatives: Throw an exception to indicate timing failure. “Anytime” computation: use incomplete results of C

Requirement 1 for Determinacy: Periodicity sampleTime: 8 thread 1: … sampleTime: 1 sampleTime: 1 thread 2: … interlock If the execution of C runs shorter than expected, data determinacy requires that thread 2 be delayed accordingly. That is, it must not start the next execution of C before the data is available.

Semaphore Synchronization Required Exactly Twice Per Major Period sampleTime: 8 thread 1: … sampleTime: 1 sampleTime: 1 thread 2: … Note that semaphore synchronization is not required if actor B runs long because its thread has higher priority. Everything else is automatically delayed.

Simulink and Real-Time Workshop (The MathWorks) Typical usage pattern: model the continuous dynamics of the physical plant model the discrete-time controller code generate the discrete-time controller using RTW continuous-time signal Discrete signals semantically are piecewise constant. Discrete blocks have periodic execution with a specified “sample time.”

Explicit Buffering is required in Simulink sampleTime: 8 sampleTime: 1 In Simulink, unlike dataflow, there is no buffering of data. To get the effect of presenting to C 8 successive samples at once, we have to explicitly include a buffering actor that outputs an array.

Requirement 2 for Determinacy: Data Integrity During Execution sampleTime: 8 thread 1: thread 2: sampleTime: 1 It is essential that input data remains stable during one complete execution of C, something achieved in Simulink with a zero-order hold (ZOH) block.

Simulink Strategy for Preserving Determinacy sampleTime: 8 ZOH RingBuffer ZOH … thread 1: thread 2: sampleTime: 1 In “Multitasking Mode,” Simulink requires a Zero-Order Hold (ZOH) block at any downsampling point. The ZOH runs at the slow rate, but at the priority of the fast rate. The ZOH holds the input to C constant for an entire execution.

In Dataflow, Interlocks and Built-in Buffering take care of these dependencies No ZOH block is required! For dataflow, a one-time interlock ensures sufficient data at the input of C: thread 1: … high priority periodic interlocks thread 2: first-time interlock … low priority

Aside: Ptolemy Classic Code Generator Used Such Interlocks (since about 1990) SDF model, parallel schedule, and synthesized DSP assembly code It is an interesting (and rich) research problem to minimize interlocks in complex multirate applications.

Aside: Ptolemy Classic Development Platform (1990) An SDF model, a “Thor” model of a 2-DSP architecture, a “logic analyzer” trace of the execution of the architecture, and two DSP code debugger windows, one for each processor.

Aside: Application to ADPCM Speech Coding (1993) Note updated DSP debugger interface with host/DSP interaction.

Aside: Heterogeneous Architecture with DSP and Sun Sparc Workstation (1995) DSP card in a Sun Sparc Workstation runs a portion of a Ptolemy model; the other portion runs on the Sun.

Consider a Low-Rate Actor Sending Data to a High-Rate Actor sequential schedule sampleTime: 1 sampleTime: 4 Note that data precedences make it impossible to achieve uniform timing for A and C with the periodic non-concurrent schedule indicated above.

Overlapped Iterations Can Solve This Problem thread 1: thread 2: produce/consume: 1 produce/consume: 4 This solution takes advantage of the intrinsic buffering provided by dataflow models. For dataflow, this requires the initial interlock as before, and the same periodic interlocks.

Simulink Strategy Delay ZOH Delay ZOH Delay ZOH thread 1: thread 2: sampleTime: 1 sampleTime: 4 Without buffering, the Delay provides just one initial sample to C (there is no buffering in Simulink). The Delay and ZOH run at the rates of the slow actor, but at the priority of the fast ones. Part of the objective seems to be to have no initial transient. Why?

Time-Triggered Models and Logical Execution Time (LET) In time-triggered models (e.g. Giotto, TDL, Simulink/RTW), each actor has a logical execution time (LET). Its actual execution time always appears to have taken the time of the LET. Lower frequency task: Higher frequency Task t t t+5ms t+5ms t+10ms t+10ms

The LET (Logical Execution Time) Programming Model Software Task write actuator output at time t+d, for specified d read sensor input at time t Examples: Giotto, TDL, Slide from Tom Henzinger

The LET (Logical Execution Time) Programming Model time t time t+d real execution on CPU buffer output Slide from Tom Henzinger

Portability 50% CPU speedup Slide from Tom Henzinger

Composability Task 1 Task 2 Slide from Tom Henzinger

Determinism Timing predictability: minimal jitter Function predictability: no race conditions Slide from Tom Henzinger

Contrast LET with Standard Practice make output available as soon as ready Slide from Tom Henzinger

Contrast LET with Standard Practice data race Slide from Tom Henzinger

Giotto Strategy for Preserving Determinacy frequency: 1 thread 1: … thread 2: … frequency: 8 First execution of C operates on initial data in the delay. Second execution operates on the result of the 8-th execution of A.

Giotto: A Delay on Every Arc frequency: 1 thread 1: … thread 2: … frequency: 8 Since Giotto has a delay on every connection, there is no need to show it. It is implicit. Is a delay on every arc a good idea?

Giotto Strategy for the Pipeline Example Delay2 Delay Delay2 Delay Delay2 Delay thread 1: thread 2: sampleTime: 1 sampleTime: 4 Giotto uses delays on all connections. The effect is the same, except that there is one additional sample delay from input to output.

Discussion Questions What about more complicated rate conversions (e.g. a task with sampleTime 2 feeding one with sampleTime 3)? What are the advantages and disadvantages of the Giotto delays? Could concurrent execution be similarly achieved with synchronous languages? How does concurrent execution of dataflow compare to Giotto and Simulink? Which of these approaches is more attractive from the application designer’s perspective? How can these ideas be extended to non-periodic execution? (modal models, Timed Multitasking, xGiotto, Ptides)