Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication Maarten Wiggers PhD student,

Slides:



Advertisements
Similar presentations
Chapter 7 - Resource Access Protocols (Critical Sections) Protocols: No Preemptions During Critical Sections Once a job enters a critical section, it cannot.
Advertisements

Mutual Exclusion – SW & HW By Oded Regev. Outline: Short review on the Bakery algorithm Short review on the Bakery algorithm Black & White Algorithm Black.
Global States.
How to Schedule a Cascade in an Arbitrary Graph F. Chierchetti, J. Kleinberg, A. Panconesi February 2012 Presented by Emrah Cem 7301 – Advances in Social.
1 Chapter 5 Concurrency: Mutual Exclusion and Synchronization Principals of Concurrency Mutual Exclusion: Hardware Support Semaphores Readers/Writers Problem.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
Continuous Media 1 Differs significantly from textual and numeric data because of two fundamental characteristics: –Real-time storage and retrieval –High.
LOGO Video Packet Selection and Scheduling for Multipath Streaming IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 9, NO. 3, APRIL 2007 Dan Jurca, Student Member,
Requirements on the Execution of Kahn Process Networks Marc Geilen and Twan Basten 11 April 2003 /e.
Parametric Throughput Analysis of Synchronous Data Flow Graphs
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Basic Real Time Concepts Systems Concepts Real-Time Definitions Events and Determinism CPU Utilization Real-Time System Design Issues Example Real-Time.
Lab Meeting Performance Analysis of Distributed Embedded Systems Lothar Thiele and Ernesto Wandeler Presented by Alex Cameron 17 th August, 2012.
DATAFLOW PROCESS NETWORKS Edward A. Lee Thomas M. Parks.
Synthesis of Embedded Software Using Free-Choice Petri Nets.
I MPLEMENTING S YNCHRONOUS M ODELS ON L OOSELY T IME T RIGGERED A RCHITECTURES Discussed by Alberto Puggelli.
Abhijit Davare 1, Qi Zhu 1, Marco Di Natale 2, Claudio Pinello 3, Sri Kanajan 2, Alberto Sangiovanni-Vincentelli 1 1 University of California, Berkeley.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
PTIDES: Programming Temporally Integrated Distributed Embedded Systems Yang Zhao, EECS, UC Berkeley Edward A. Lee, EECS, UC Berkeley Jie Liu, Microsoft.
Ordering and Consistent Cuts Presented By Biswanath Panda.
SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.
Using Interfaces to Analyze Compositionality Haiyang Zheng and Rachel Zhou EE290N Class Project Presentation Dec. 10, 2004.
Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.
Ncue-csie1 A QoS Guaranteed Multipolling Scheme for Voice Traffic in IEEE Wireless LANs Der-Jiunn Deng 、 Chong-Shuo Fan 、 Chao-Yang Lin Speaker:
A Schedulability-Preserving Transformation of BDF to Petri Nets Cong Liu EECS 290n Class Project December 10, 2004.
FunState – An Internal Design Representation for Codesign A model that enables representations of different types of system components. Mixture of functional.
Scheduling Using Timed Automata Borzoo Bonakdarpour Wednesday, April 13, 2005 Selected Topics in Algorithms and Complexity (CSE960)
1 Quasi-Static Scheduling of Embedded Software Using Free-Choice Petri Nets Marco Sgroi, Alberto Sangiovanni-Vincentelli Luciano Lavagno University of.
Dataflow Process Networks Lee & Parks Synchronous Dataflow Lee & Messerschmitt Abhijit Davare Nathan Kitchen.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
Models of Computation for Embedded System Design Alvise Bonivento.
A Denotational Semantics For Dataflow with Firing Edward A. Lee Jike Chong Wei Zheng Paper Discussion for.
Heterochronous Dataflow in Ptolemy II Brian K. Vogel EE249 Project Presentation, Dec. 4, 1999.
Real-Time Kernels and Operating Systems. Operating System: Software that coordinates multiple tasks in processor, including peripheral interfacing Types.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing.
1 of 16 June 21, 2000 Schedulability Analysis for Systems with Data and Control Dependencies Paul Pop, Petru Eles, Zebo Peng Department of Computer and.
Misconceptions About Real-time Computing : A Serious Problem for Next-generation Systems J. A. Stankovic, Misconceptions about Real-Time Computing: A Serious.
BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael.
1 Real time signal processing SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Technische universiteit eindhoven Department of Electrical Engineering Electronic Systems Liveness and Boundedness of Synchronous Data Flow Graphs A.H.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.
Institut für Computertechnik ICT Institute of Computer Technology Interaction of SystemC AMS Extensions with TLM 2.0 Markus Damm, Christoph.
1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Internal Design Representations for Embedded System Design Lothar.
Voicu Groza, 2008 SITE, HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Hardware/Software Codesign of Embedded Systems Voicu Groza SITE Hall, Room.
Scheduling policies for real- time embedded systems.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Undergraduate course on Real-time Systems Linköping 1 of 45 Autumn 2009 TDDC47: Real-time and Concurrent Programming Lecture 5: Real-time Scheduling (I)
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
School of Computer Science, The University of Adelaide© The University of Adelaide, Control Data Flow Graphs An experiment using Design/CPN Sue Tyerman.
CSCI1600: Embedded and Real Time Software Lecture 11: Modeling IV: Concurrency Steven Reiss, Fall 2015.
Royal Institute of Technology System Specification Fundamentals Axel Jantsch, Royal Institute of Technology Stockholm, Sweden.
Static Process Scheduling
Introduction to Real-Time Systems
Presented by: Belgi Amir Seminar in Distributed Algorithms Designing correct concurrent algorithms Spring 2013.
High Performance Embedded Computing © 2007 Elsevier Lecture 4: Models of Computation Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Undergraduate course on Real-time Systems Linköping University TDDD07 Real-time Systems Lecture 2: Scheduling II Simin Nadjm-Tehrani Real-time Systems.
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
Semaphores Chapter 6. Semaphores are a simple, but successful and widely used, construct.
Marilyn Wolf1 With contributions from:
Memory Consistency Models
Memory Consistency Models
ESE535: Electronic Design Automation
Real time signal processing
Presentation transcript:

Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication Maarten Wiggers PhD student, University of Twente, NL Co-author and supervisor: Marco Bekooij, NXP Semiconductors Research Gerard Smit, University of Twente

Maarten Wiggers -- University of Twente 2 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion [Wiggers – DATE 2008, Wiggers – RTAS 2008]

Maarten Wiggers -- University of Twente 3 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion

Maarten Wiggers -- University of Twente 4 Multi-stream car-entertainment system

Maarten Wiggers -- University of Twente 5 Application model use-case input data stream task use-case FRT video job task input data stream output stream to display FRT audio job task output stream to speakers task Jobs process streams of data Jobs are composed of tasks Simultaneously running jobs together form use-cases Jobs often have real-time requirements –Firm (FRT) if deadline misses are highly undesirable (steep quality degradation)

Maarten Wiggers -- University of Twente 6 Task graphs Jobs are implemented as task graphs –Tasks communicate fixed-sized containers over fixed-sized FIFO buffers Container is a place-holder for data –Task has random access in container Task only starts an execution on sufficient –Full containers in input buffers –Empty containers in output buffers (back-pressure)‏ Backpressure robustly prevents buffer overflow Required quanta of containers can be –Known at design-time –Dependent on the actual processed stream

Maarten Wiggers -- University of Twente 7 Example job – MP3 playback MP3 decoding task consumes a variable number of bytes per frame –Every execution a different number of bytes consumed –BR task executes a-periodically –No static-order schedule for BR and MP3  run-time arbitration Throughput constraint : sink needs to execute strictly periodically –All tasks are pushing data towards the sink –For sufficiently large buffers, sink can execute strictly periodically n=[0,960]

Maarten Wiggers -- University of Twente 8 Example job – H.263 video decoder Variable length decoder (VLD) consumes a variable number of bytes per frame VLD produces a variable number of blocks per frame DQ and IDCT process blocks Motion compensator assembles a frame from blocks Throughput constraint : sink needs to execute strictly periodically m=[0,6536]n=[0,2376]

Maarten Wiggers -- University of Twente 9 Application trend Behaviour of applications is increasingly input-data dependent, e.g. –Entropy encoding –Adaptation to channel conditions by digital radio’s Reflected in –Input-data dependent execution times –Conditional execution of code –Mode changes –Input-data dependent execution rates Input-data dependent execution rates requires run-time arbitration

Maarten Wiggers -- University of Twente 10 Trend  challenge Required properties –Functionally deterministic behaviour: output values completely determined by input values –Deadlock free –Throughput constraint satisfied Research challenge is to define models –For which required properties are decidable –Can model applications with input-data dependent behaviour –Include effects of run-time arbitration –E.g. Variable-Rate Dataflow

Maarten Wiggers -- University of Twente 11 Multi-processor architecture template Multi-processor system required for performance and power reasons DSP mem Arb NI I/O External SDRAM CA ctrl PP Network-on-Chip NI $ [Hansson – TODAES 2008]

Maarten Wiggers -- University of Twente 12 Compute settings Dataflow synthesis (cyclic) task graph WCET multiprocessor instance throughput and latency constraint scheduler settings and buffer capacities

Maarten Wiggers -- University of Twente 13 Compute settings Guarantees on end-to-end throughput requires guarantees on deadlock-freedom Models that provide end-to-end throughput guarantees are not Turing complete –Poses restrictions on Applications : e.g. inter-task synchronisation behaviour Architectures : e.g. applicable run-time arbitration schemes Goal: define a model that can guarantee throughput for H.263

Maarten Wiggers -- University of Twente 14 Example Every execution, task B can choose to consume either 2 or 3 Required buffer capacity for deadlock freedom?

Maarten Wiggers -- University of Twente 15 Example (cont.)‏ Attempt : assume maximum consumption quantum in every execution Requires buffer capacity of 3 for deadlock freedom

Maarten Wiggers -- University of Twente 16 Example (cont.)‏ However, when consuming the minimum quantum Buffer capacity of 3 is insufficient!

Maarten Wiggers -- University of Twente 17 Example (cont.)‏

Maarten Wiggers -- University of Twente 18 Example (cont.)‏

Maarten Wiggers -- University of Twente 19 Example (cont.)‏ Deadlock!

Maarten Wiggers -- University of Twente 20 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion

Maarten Wiggers -- University of Twente 21 Compute buffer capacities –Guarantee satisfaction of throughput constraint –Tasks can require data-dependent quantum of data and space per execution Problem

Maarten Wiggers -- University of Twente 22 Problem Compute buffer capacities –Guarantee satisfaction of throughput constraint –Tasks can require data-dependent quantum of data and space per execution Assumptions –Run-time arbitration on shared resources –Upper and lower bounds on transferred quanta –Upper bound on execution time –Throughput constraint: sink or source that executes strictly periodically

Maarten Wiggers -- University of Twente 23 Related work Quasi static-order scheduling –Transfer quanta change only after (sub) graph iterations –For every iteration a static-order schedule computed Bounded memory is decidable –Models are amenable for code-synthesis –Examples Heterochronous Dataflow [Girault – TCAD 1999] Parameterised Dataflow [Bhattacharya – TSP 2001] –Requirement on changes only after graph iterations is a global requirement Iteration is a graph property VLD parses stream and decides next quantum locally –Static order scheduling excludes overlapped schedules of graphs with different transfer quanta

Maarten Wiggers -- University of Twente 24 Requirements on quanta change

Maarten Wiggers -- University of Twente 25 Requirements on quanta change

Maarten Wiggers -- University of Twente 26 Requirements on quanta change Quasi static-order scheduling: 2*A and 3*B before change

Maarten Wiggers -- University of Twente 27 Requirements on quanta change Variable-Rate Dataflow: can change every firing

Maarten Wiggers -- University of Twente 28 Related work Variable token sizes instead of variable number of transferred tokens –[Sen – ASSP 2005] –Experiment will show that this results in larger buffers –Variable consumption quantum by VLD depends on processed stream BR task is unaware of the semantics of the stream  cannot know quantum

Maarten Wiggers -- University of Twente 29 Related work Variable token sizes instead of variable number of transferred tokens –[Sen – ASSP 2005] –Experiment will show that this results in larger buffers –Variable consumption quantum by VLD depends on processed stream BR task is unaware of the semantics of the stream  cannot know quantum

Maarten Wiggers -- University of Twente 30 Related work Run-time arbitration –Not required to compute schedules at design-time –Only need to show that for all transfer quanta a schedule exists –State-of-the-art Real-time calculus (group of Thiele at ETH Zurich) Symta/S (group of Ernst at TU Braunschweig) –These approaches have Difficulties with cyclic dependencies that influence the temporal behaviour No means to reason about bounded memory or deadlock properties –E.g. no concept similar to consistency

Maarten Wiggers -- University of Twente 31 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion

Maarten Wiggers -- University of Twente 32 Phase 1 Next slides discuss buffer capacity computation in case of chain topology

Maarten Wiggers -- University of Twente 33 Phase 1 and 2 Next slides discuss buffer capacity computation in case of chain topology Subsequent slides discuss extension to graphs

Maarten Wiggers -- University of Twente 34 Implementation = Task graph Model = Dataflow graph Variable Rate Dataflow (by example)

Maarten Wiggers -- University of Twente 35 Variable Rate Dataflow Task graph –Tasks –Buffers Tasks –Have a bounded response time –Consume and produce data between start and finish Buffers have a finite and fixed capacity Dataflow graph –Actors –Queues Actors –Have a fixed response time –Consume tokens atomically at the start –Produce tokens atomically at the finish Queues have infinite depth

Maarten Wiggers -- University of Twente 36 period time-slice Execution time  response time

Maarten Wiggers -- University of Twente 37 period time-slice Execution time  response time Explained in detail in [Wiggers – RTAS 2007] Generalisation that includes all starvation-free schedulers in [Wiggers – SCOPES 2007]

Maarten Wiggers -- University of Twente 38 Variable Rate Dataflow Task graph –Tasks –Buffers Tasks –Have a bounded response time –Consume and produce data between start and finish Buffers have a finite and fixed capacity Dataflow graph –Actors –Queues Actors –Have a fixed response time –Consume tokens atomically at the start –Produce tokens atomically at the finish Queues have infinite depth Input specificationAnalysis vehicle

Maarten Wiggers -- University of Twente 39 Approach Model task graph on architecture by Variable-Rate Dataflow graph Let actor v τ model the throughput constraining task Compute sufficient number of tokens to enable actor v τ to execute strictly periodically Computed number of tokens equals required buffer capacity –One-to-one correspondence Containers in task graph – tokens in dataflow graph Enabling condition task – firing rule actor Containers consumed and produced – tokens consumed and produced –Execution times of actors are upper bound on execution times of tasks –Self-timed execution of Variable-Rate Dataflow is temporally monotonic

Maarten Wiggers -- University of Twente 40 Monotonic temporal behaviour VRDF actors have sequential firing rules [Lee – 1995] –The number of tokens that is required to be present on inputs is completely determined by already consumed tokens VRDF actors are functional –The produced tokens are a function of the consumed tokens Given self-timed execution. If a token arrives earlier on an input, then –This can only lead to an earlier satisfaction of the firing rule, and –This can only lead to an earlier production of the same tokens E.g. a smaller response time of a VRDF actor cannot lead to any later token arrival time Because of scheduling anomalies this is not true for the task graph! –A smaller response time can lead to later container arrival times Token arrival times conservatively bound container arrival times

Maarten Wiggers -- University of Twente 41 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates

Maarten Wiggers -- University of Twente 42 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates

Maarten Wiggers -- University of Twente 43 Approach – step 1 Determine on each edge the maximum required transfer and firing rates Sink has to fire strictly periodically Maximum required transfer rate on edge for –Maximum consumption quantum Maximum required firing rates of A for –Minimum production quantum

Maarten Wiggers -- University of Twente 44 Approach – step 1 Determine on each edge the maximum required transfer and firing rates Sink has to fire strictly periodically Maximum required transfer rate on edge for –Maximum consumption quantum Maximum required firing rates of A for –Minimum production quantum

Maarten Wiggers -- University of Twente 45 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates

Maarten Wiggers -- University of Twente 46 Approach – step 2 Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Given linear bounds on production and consumption times Find difference between bounds that allows existence of schedule for all sequences of quanta

Maarten Wiggers -- University of Twente 47 Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum  larger difference between bounds Approach – step 2 Given linear bounds on production and consumption times Find difference between bounds that allows existence of schedule for all sequences of quanta

Maarten Wiggers -- University of Twente 48 Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum  larger delay next start time If largest quantum between bounds, then every sequence between bounds Approach – step 2 Given linear bounds on production and consumption times Find difference between bounds that allows existence of schedule for all sequences of quanta

Maarten Wiggers -- University of Twente 49 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates

Maarten Wiggers -- University of Twente 50 Buffer capacity is maximum difference between tokens consumed and produced Approach – step 3 Difference between linear bounds is buffer capacity

Maarten Wiggers -- University of Twente 51 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates

Maarten Wiggers -- University of Twente 52 Buffer capacities are sufficient for smaller rates Smaller rate by A  delay in schedule of A VRDF graphs have linear temporal behaviour –A delay Δ in production time cannot lead to a production that is delayed by more than Δ Approach – step 4

Maarten Wiggers -- University of Twente 53 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A  delay in schedule of A VRDF graphs have linear temporal behaviour –A delay Δ in production time cannot lead to a production that is delayed by more than Δ

Maarten Wiggers -- University of Twente 54 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A  delay in schedule of A VRDF graphs have linear temporal behaviour –A delay Δ in production time cannot lead to a production that is delayed by more than Δ

Maarten Wiggers -- University of Twente 55 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A  delay in schedule of A VRDF graphs have linear temporal behaviour –A delay Δ in production time cannot lead to a production that is delayed by more than Δ

Maarten Wiggers -- University of Twente 56 Chains of buffers Find the maximum firing rates for all actors Compute buffer capacities for these rates If MP3 consumes less, then starts of BR are postponed By linearity data will still arrive on time at MP3 Computed buffer capacities verified in our dataflow simulator

Maarten Wiggers -- University of Twente 57 Phase 1 and 2 Next slides discuss buffer capacity computation in case of chain topology Subsequent slides discuss extension to graphs

Maarten Wiggers -- University of Twente 58 Relaxing constraints on topology Graph definition –Consistency of task graph –Consistency is not sufficient for bounded memory Computation of buffer capacities is now a global problem

Maarten Wiggers -- University of Twente 59 Parameter communication Communication of parameter values Enables modelling of conditional execution of tasks

Maarten Wiggers -- University of Twente 60 Parameter communication Communication of parameter values Enables modelling of conditional execution of tasks Sequential firing rules

Maarten Wiggers -- University of Twente 61 if-then-else Buffer capacities computed for all combinations of sequences of t and f t=!f (mutual exclusivity) is just a subset Model abstracts from actual relations between parameters

Maarten Wiggers -- University of Twente 62 Consistency Transfer quanta on edges determine relative firing rates [Lee – TC 1987] [Lee – TPDS 1991]

Maarten Wiggers -- University of Twente 63 Consistency Transfer quanta on edges determine relative firing rates Multiple paths between two actors –Requires check whether their exist firing rates with bounded memory

Maarten Wiggers -- University of Twente 64 Consistency Fixed transfer quanta cannot model data-dependent behaviour Allowing for different transfer quantum in every firing Specification of intervals is insufficient

Maarten Wiggers -- University of Twente 65 Consistency Specification of intervals is insufficient

Maarten Wiggers -- University of Twente 66 Specification of intervals is insufficient Therefore introduce transfer parameters Consistency

Maarten Wiggers -- University of Twente 67 Specification of intervals is insufficient Therefore introduce transfer parameters Variable-Rate Dataflow graph is (strongly) consistent if there exists a non-trivial symbolic solution to the symbolic balance equations Consistency

Maarten Wiggers -- University of Twente 68 Consistency is insufficient Boolean dataflow graph Bounded memory depends on control values Bounded memory can be undecidable [Buck – 1993]

Maarten Wiggers -- University of Twente 69 Boolean dataflow graph Bounded memory depends on control values Bounded memory can be undecidable Consistency is insufficient

Maarten Wiggers -- University of Twente 70 Boolean dataflow graph Bounded memory depends on control values Bounded memory can be undecidable Consistency is insufficient

Maarten Wiggers -- University of Twente 71 Chosen restriction In the VRDF graph we require that repetition rate of actors in this sub- graph is one

Maarten Wiggers -- University of Twente 72 Every parameter value should correspond with an iteration of this sub-graph In the VRDF graph we require that repetition rate of actors in this sub- graph is one Chosen restriction

Maarten Wiggers -- University of Twente 73 Chosen restriction

Maarten Wiggers -- University of Twente 74 Chosen restriction

Maarten Wiggers -- University of Twente 75 Chosen restriction

Maarten Wiggers -- University of Twente 76 OK Chosen restriction

Maarten Wiggers -- University of Twente 77 Chosen restriction This restriction implies that (strong) consistency is sufficient for bounded memory

Maarten Wiggers -- University of Twente 78 Requirement –Sink determines throughput for all transfer quanta Tasks are pushing data to sink –Different quanta imply different task execution rates –Tasks always need to be able to follow Buffer capacity –Should enable tasks to follow maximum required rate –Variation in quanta requires larger buffers Buffer capacities

Maarten Wiggers -- University of Twente 79 Buffer capacity (I)

Maarten Wiggers -- University of Twente 80 Buffer capacity (II)

Maarten Wiggers -- University of Twente 81 Minimise difference in start times Buffer capacity (III)

Maarten Wiggers -- University of Twente 82 Required buffer capacity Buffer capacity (IV)

Maarten Wiggers -- University of Twente 83 ACB β=1 General topology Minimum difference between start times of actors –Not a property of an edge –Determined by all paths

Maarten Wiggers -- University of Twente 84 ACB β=1 s=0s=2s=1 General topology Minimum difference between start times of actors –Not a property of an edge –Determined by all paths

Maarten Wiggers -- University of Twente 85 ACB β=1 s=0s=2s=1 22 General topology Minimum difference between start times of actors –Not a property of an edge –Determined by all paths

Maarten Wiggers -- University of Twente 86 Required buffer capacity Buffer capacity with β=1

Maarten Wiggers -- University of Twente 87 Required buffer capacity Buffer capacity with β=2

Maarten Wiggers -- University of Twente 88 General topology Minimum difference between start times of actors –Not a property of an edge –Determined by all paths Network flow problem –Constraints minimum differences per edge –Objective start times as close as possible together

Maarten Wiggers -- University of Twente 89 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion

Maarten Wiggers -- University of Twente 90 H.263 decoder m is number of bytes read per picture n is number of blocks per picture Motion compensation needs to know how many blocks to read to assemble a picture

Maarten Wiggers -- University of Twente 91 Alternative implementation

Maarten Wiggers -- University of Twente 92 Buffer capacity Our implementation –Buffer capacity is in blocks Alternative implementation –buffer capacity is in frames

Maarten Wiggers -- University of Twente 93 Conclusion Trend : streaming applications are increasingly dynamic –Include tasks that have data-dependent execution rates –Implies run-time arbitration Variable Rate Dataflow –Production and consumption quanta can change in every execution –Can include effects of run-time arbitration –Efficient checks on execution in bounded memory Compute buffer capacities that guarantee satisfaction of a throughput constraint –Temporal monotonicity : token arrival times are conservative container arrival times –Temporal linearity : Δ later token arrival time cannot result in any token arrival time that is delayed by more than Δ

Maarten Wiggers -- University of Twente 94 Questions?

Maarten Wiggers -- University of Twente 95 References [Bhattacharya – TSP 2001] B. Bhattacharya and S.S. Bhattacharyya. Parameterized Dataflow Modeling for DSP Systems. IEEE Transactions on Signal Processing. October 2001 [Buck – 1993] J. Buck. Scheduling Dynamic Dataflow Graphs with Bounded Memory using the Token Flow Model. PhD thesis, University of Berkeley [Girault – TCAD 1999] A. Girault, B. Lee and E.A. Lee. Hierarchical Finite State Machines with Multiple Concurrency Models. IEEE Transactions on CAD. June 1999 [Hansson – TODAES 2008] A. Hansson, K.G.W. Goossens, M.J.G. Bekooij and J. Huisken. CoMPSoC: A Composable and Predictable Multi-Processor System on Chip Template. ACM Transactions on Design Automation of Electronic Systems. To appear [Lee – TC 1987] E.A. Lee and D. Messerschmitt. Static Scheduling of Synchronous Dataflow Programs for Digital Signal Processing. IEEE Transactions on Computers. January 1987 [Lee – TPDS 1991] E.A. Lee. Consistency in Dataflow Graphs. IEEE Transactions on Par. and Distr. Systems [Lee – 1995] E.A. Lee and T. Parks. Dataflow Process Networks. Proc. of the IEEE. May 1995 [Sen – ASSP 2005] M. Sen, S.S. Bhattacharyya, T. Lv, and W. Wolf. Modeling Image Processing Systems with Homogeneous Parameterized Dataflow Graphs. In Proc. ASSP. March 2005

Maarten Wiggers -- University of Twente 96 References [Wiggers – RTAS 2007] M.H. Wiggers, M.J.G. Bekooij, P.G. Jansen and G.J.M. Smit. Efficient Computation of Buffer Capacities for Cyclo-Static Real-Time Systems with Back-Pressure. In Proc. RTAS. April 2007 [Wiggers – SCOPES 2007] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Modelling Run-Time Arbitration by Latency-Rate Servers in Dataflow Graphs. In Proc. SCOPES. April 2007 [Wiggers – DATE 2008] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Computation of Buffer Capacities for Throughput Constrained and Data-Dependent Inter-Task Communication. In Proc. DATE. April 2008 [Wiggers – RTAS 2008] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication. In Proc. RTAS. April 2008