Download presentation
Presentation is loading. Please wait.
1
Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication Maarten Wiggers PhD student, University of Twente, NL Co-author and supervisor: Marco Bekooij, NXP Semiconductors Research Gerard Smit, University of Twente
2
Maarten Wiggers -- University of Twente 2 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion [Wiggers – DATE 2008, Wiggers – RTAS 2008]
3
Maarten Wiggers -- University of Twente 3 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion
4
Maarten Wiggers -- University of Twente 4 Multi-stream car-entertainment system
5
Maarten Wiggers -- University of Twente 5 Application model use-case input data stream task use-case FRT video job task input data stream output stream to display FRT audio job task output stream to speakers task Jobs process streams of data Jobs are composed of tasks Simultaneously running jobs together form use-cases Jobs often have real-time requirements –Firm (FRT) if deadline misses are highly undesirable (steep quality degradation)
6
Maarten Wiggers -- University of Twente 6 Task graphs Jobs are implemented as task graphs –Tasks communicate fixed-sized containers over fixed-sized FIFO buffers Container is a place-holder for data –Task has random access in container Task only starts an execution on sufficient –Full containers in input buffers –Empty containers in output buffers (back-pressure) Backpressure robustly prevents buffer overflow Required quanta of containers can be –Known at design-time –Dependent on the actual processed stream
7
Maarten Wiggers -- University of Twente 7 Example job – MP3 playback MP3 decoding task consumes a variable number of bytes per frame –Every execution a different number of bytes consumed –BR task executes a-periodically –No static-order schedule for BR and MP3 run-time arbitration Throughput constraint : sink needs to execute strictly periodically –All tasks are pushing data towards the sink –For sufficiently large buffers, sink can execute strictly periodically n=[0,960]
8
Maarten Wiggers -- University of Twente 8 Example job – H.263 video decoder Variable length decoder (VLD) consumes a variable number of bytes per frame VLD produces a variable number of blocks per frame DQ and IDCT process blocks Motion compensator assembles a frame from blocks Throughput constraint : sink needs to execute strictly periodically m=[0,6536]n=[0,2376]
9
Maarten Wiggers -- University of Twente 9 Application trend Behaviour of applications is increasingly input-data dependent, e.g. –Entropy encoding –Adaptation to channel conditions by digital radio’s Reflected in –Input-data dependent execution times –Conditional execution of code –Mode changes –Input-data dependent execution rates Input-data dependent execution rates requires run-time arbitration
10
Maarten Wiggers -- University of Twente 10 Trend challenge Required properties –Functionally deterministic behaviour: output values completely determined by input values –Deadlock free –Throughput constraint satisfied Research challenge is to define models –For which required properties are decidable –Can model applications with input-data dependent behaviour –Include effects of run-time arbitration –E.g. Variable-Rate Dataflow
11
Maarten Wiggers -- University of Twente 11 Multi-processor architecture template Multi-processor system required for performance and power reasons DSP mem Arb NI I/O External SDRAM CA ctrl PP Network-on-Chip NI $ [Hansson – TODAES 2008]
12
Maarten Wiggers -- University of Twente 12 Compute settings Dataflow synthesis (cyclic) task graph WCET multiprocessor instance throughput and latency constraint scheduler settings and buffer capacities
13
Maarten Wiggers -- University of Twente 13 Compute settings Guarantees on end-to-end throughput requires guarantees on deadlock-freedom Models that provide end-to-end throughput guarantees are not Turing complete –Poses restrictions on Applications : e.g. inter-task synchronisation behaviour Architectures : e.g. applicable run-time arbitration schemes Goal: define a model that can guarantee throughput for H.263
14
Maarten Wiggers -- University of Twente 14 Example Every execution, task B can choose to consume either 2 or 3 Required buffer capacity for deadlock freedom?
15
Maarten Wiggers -- University of Twente 15 Example (cont.) Attempt : assume maximum consumption quantum in every execution Requires buffer capacity of 3 for deadlock freedom
16
Maarten Wiggers -- University of Twente 16 Example (cont.) However, when consuming the minimum quantum Buffer capacity of 3 is insufficient!
17
Maarten Wiggers -- University of Twente 17 Example (cont.)
18
Maarten Wiggers -- University of Twente 18 Example (cont.)
19
Maarten Wiggers -- University of Twente 19 Example (cont.) Deadlock!
20
Maarten Wiggers -- University of Twente 20 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion
21
Maarten Wiggers -- University of Twente 21 Compute buffer capacities –Guarantee satisfaction of throughput constraint –Tasks can require data-dependent quantum of data and space per execution Problem
22
Maarten Wiggers -- University of Twente 22 Problem Compute buffer capacities –Guarantee satisfaction of throughput constraint –Tasks can require data-dependent quantum of data and space per execution Assumptions –Run-time arbitration on shared resources –Upper and lower bounds on transferred quanta –Upper bound on execution time –Throughput constraint: sink or source that executes strictly periodically
23
Maarten Wiggers -- University of Twente 23 Related work Quasi static-order scheduling –Transfer quanta change only after (sub) graph iterations –For every iteration a static-order schedule computed Bounded memory is decidable –Models are amenable for code-synthesis –Examples Heterochronous Dataflow [Girault – TCAD 1999] Parameterised Dataflow [Bhattacharya – TSP 2001] –Requirement on changes only after graph iterations is a global requirement Iteration is a graph property VLD parses stream and decides next quantum locally –Static order scheduling excludes overlapped schedules of graphs with different transfer quanta
24
Maarten Wiggers -- University of Twente 24 Requirements on quanta change
25
Maarten Wiggers -- University of Twente 25 Requirements on quanta change
26
Maarten Wiggers -- University of Twente 26 Requirements on quanta change Quasi static-order scheduling: 2*A and 3*B before change
27
Maarten Wiggers -- University of Twente 27 Requirements on quanta change Variable-Rate Dataflow: can change every firing
28
Maarten Wiggers -- University of Twente 28 Related work Variable token sizes instead of variable number of transferred tokens –[Sen – ASSP 2005] –Experiment will show that this results in larger buffers –Variable consumption quantum by VLD depends on processed stream BR task is unaware of the semantics of the stream cannot know quantum
29
Maarten Wiggers -- University of Twente 29 Related work Variable token sizes instead of variable number of transferred tokens –[Sen – ASSP 2005] –Experiment will show that this results in larger buffers –Variable consumption quantum by VLD depends on processed stream BR task is unaware of the semantics of the stream cannot know quantum
30
Maarten Wiggers -- University of Twente 30 Related work Run-time arbitration –Not required to compute schedules at design-time –Only need to show that for all transfer quanta a schedule exists –State-of-the-art Real-time calculus (group of Thiele at ETH Zurich) Symta/S (group of Ernst at TU Braunschweig) –These approaches have Difficulties with cyclic dependencies that influence the temporal behaviour No means to reason about bounded memory or deadlock properties –E.g. no concept similar to consistency
31
Maarten Wiggers -- University of Twente 31 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion
32
Maarten Wiggers -- University of Twente 32 Phase 1 Next slides discuss buffer capacity computation in case of chain topology
33
Maarten Wiggers -- University of Twente 33 Phase 1 and 2 Next slides discuss buffer capacity computation in case of chain topology Subsequent slides discuss extension to graphs
34
Maarten Wiggers -- University of Twente 34 Implementation = Task graph Model = Dataflow graph Variable Rate Dataflow (by example)
35
Maarten Wiggers -- University of Twente 35 Variable Rate Dataflow Task graph –Tasks –Buffers Tasks –Have a bounded response time –Consume and produce data between start and finish Buffers have a finite and fixed capacity Dataflow graph –Actors –Queues Actors –Have a fixed response time –Consume tokens atomically at the start –Produce tokens atomically at the finish Queues have infinite depth
36
Maarten Wiggers -- University of Twente 36 period time-slice Execution time response time
37
Maarten Wiggers -- University of Twente 37 period time-slice Execution time response time Explained in detail in [Wiggers – RTAS 2007] Generalisation that includes all starvation-free schedulers in [Wiggers – SCOPES 2007]
38
Maarten Wiggers -- University of Twente 38 Variable Rate Dataflow Task graph –Tasks –Buffers Tasks –Have a bounded response time –Consume and produce data between start and finish Buffers have a finite and fixed capacity Dataflow graph –Actors –Queues Actors –Have a fixed response time –Consume tokens atomically at the start –Produce tokens atomically at the finish Queues have infinite depth Input specificationAnalysis vehicle
39
Maarten Wiggers -- University of Twente 39 Approach Model task graph on architecture by Variable-Rate Dataflow graph Let actor v τ model the throughput constraining task Compute sufficient number of tokens to enable actor v τ to execute strictly periodically Computed number of tokens equals required buffer capacity –One-to-one correspondence Containers in task graph – tokens in dataflow graph Enabling condition task – firing rule actor Containers consumed and produced – tokens consumed and produced –Execution times of actors are upper bound on execution times of tasks –Self-timed execution of Variable-Rate Dataflow is temporally monotonic
40
Maarten Wiggers -- University of Twente 40 Monotonic temporal behaviour VRDF actors have sequential firing rules [Lee – 1995] –The number of tokens that is required to be present on inputs is completely determined by already consumed tokens VRDF actors are functional –The produced tokens are a function of the consumed tokens Given self-timed execution. If a token arrives earlier on an input, then –This can only lead to an earlier satisfaction of the firing rule, and –This can only lead to an earlier production of the same tokens E.g. a smaller response time of a VRDF actor cannot lead to any later token arrival time Because of scheduling anomalies this is not true for the task graph! –A smaller response time can lead to later container arrival times Token arrival times conservatively bound container arrival times
41
Maarten Wiggers -- University of Twente 41 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates
42
Maarten Wiggers -- University of Twente 42 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates
43
Maarten Wiggers -- University of Twente 43 Approach – step 1 Determine on each edge the maximum required transfer and firing rates Sink has to fire strictly periodically Maximum required transfer rate on edge for –Maximum consumption quantum Maximum required firing rates of A for –Minimum production quantum
44
Maarten Wiggers -- University of Twente 44 Approach – step 1 Determine on each edge the maximum required transfer and firing rates Sink has to fire strictly periodically Maximum required transfer rate on edge for –Maximum consumption quantum Maximum required firing rates of A for –Minimum production quantum
45
Maarten Wiggers -- University of Twente 45 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates
46
Maarten Wiggers -- University of Twente 46 Approach – step 2 Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Given linear bounds on production and consumption times Find difference between bounds that allows existence of schedule for all sequences of quanta
47
Maarten Wiggers -- University of Twente 47 Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum larger difference between bounds Approach – step 2 Given linear bounds on production and consumption times Find difference between bounds that allows existence of schedule for all sequences of quanta
48
Maarten Wiggers -- University of Twente 48 Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum larger delay next start time If largest quantum between bounds, then every sequence between bounds Approach – step 2 Given linear bounds on production and consumption times Find difference between bounds that allows existence of schedule for all sequences of quanta
49
Maarten Wiggers -- University of Twente 49 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates
50
Maarten Wiggers -- University of Twente 50 Buffer capacity is maximum difference between tokens consumed and produced Approach – step 3 Difference between linear bounds is buffer capacity
51
Maarten Wiggers -- University of Twente 51 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of –A linear upper bound on token production times, and –A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative –Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates
52
Maarten Wiggers -- University of Twente 52 Buffer capacities are sufficient for smaller rates Smaller rate by A delay in schedule of A VRDF graphs have linear temporal behaviour –A delay Δ in production time cannot lead to a production that is delayed by more than Δ Approach – step 4
53
Maarten Wiggers -- University of Twente 53 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A delay in schedule of A VRDF graphs have linear temporal behaviour –A delay Δ in production time cannot lead to a production that is delayed by more than Δ
54
Maarten Wiggers -- University of Twente 54 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A delay in schedule of A VRDF graphs have linear temporal behaviour –A delay Δ in production time cannot lead to a production that is delayed by more than Δ
55
Maarten Wiggers -- University of Twente 55 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A delay in schedule of A VRDF graphs have linear temporal behaviour –A delay Δ in production time cannot lead to a production that is delayed by more than Δ
56
Maarten Wiggers -- University of Twente 56 Chains of buffers Find the maximum firing rates for all actors Compute buffer capacities for these rates If MP3 consumes less, then starts of BR are postponed By linearity data will still arrive on time at MP3 Computed buffer capacities verified in our dataflow simulator
57
Maarten Wiggers -- University of Twente 57 Phase 1 and 2 Next slides discuss buffer capacity computation in case of chain topology Subsequent slides discuss extension to graphs
58
Maarten Wiggers -- University of Twente 58 Relaxing constraints on topology Graph definition –Consistency of task graph –Consistency is not sufficient for bounded memory Computation of buffer capacities is now a global problem
59
Maarten Wiggers -- University of Twente 59 Parameter communication Communication of parameter values Enables modelling of conditional execution of tasks
60
Maarten Wiggers -- University of Twente 60 Parameter communication Communication of parameter values Enables modelling of conditional execution of tasks Sequential firing rules
61
Maarten Wiggers -- University of Twente 61 if-then-else Buffer capacities computed for all combinations of sequences of t and f t=!f (mutual exclusivity) is just a subset Model abstracts from actual relations between parameters
62
Maarten Wiggers -- University of Twente 62 Consistency Transfer quanta on edges determine relative firing rates [Lee – TC 1987] [Lee – TPDS 1991]
63
Maarten Wiggers -- University of Twente 63 Consistency Transfer quanta on edges determine relative firing rates Multiple paths between two actors –Requires check whether their exist firing rates with bounded memory
64
Maarten Wiggers -- University of Twente 64 Consistency Fixed transfer quanta cannot model data-dependent behaviour Allowing for different transfer quantum in every firing Specification of intervals is insufficient
65
Maarten Wiggers -- University of Twente 65 Consistency Specification of intervals is insufficient
66
Maarten Wiggers -- University of Twente 66 Specification of intervals is insufficient Therefore introduce transfer parameters Consistency
67
Maarten Wiggers -- University of Twente 67 Specification of intervals is insufficient Therefore introduce transfer parameters Variable-Rate Dataflow graph is (strongly) consistent if there exists a non-trivial symbolic solution to the symbolic balance equations Consistency
68
Maarten Wiggers -- University of Twente 68 Consistency is insufficient Boolean dataflow graph Bounded memory depends on control values Bounded memory can be undecidable [Buck – 1993]
69
Maarten Wiggers -- University of Twente 69 Boolean dataflow graph Bounded memory depends on control values Bounded memory can be undecidable Consistency is insufficient
70
Maarten Wiggers -- University of Twente 70 Boolean dataflow graph Bounded memory depends on control values Bounded memory can be undecidable Consistency is insufficient
71
Maarten Wiggers -- University of Twente 71 Chosen restriction In the VRDF graph we require that repetition rate of actors in this sub- graph is one
72
Maarten Wiggers -- University of Twente 72 Every parameter value should correspond with an iteration of this sub-graph In the VRDF graph we require that repetition rate of actors in this sub- graph is one Chosen restriction
73
Maarten Wiggers -- University of Twente 73 Chosen restriction
74
Maarten Wiggers -- University of Twente 74 Chosen restriction
75
Maarten Wiggers -- University of Twente 75 Chosen restriction
76
Maarten Wiggers -- University of Twente 76 OK Chosen restriction
77
Maarten Wiggers -- University of Twente 77 Chosen restriction This restriction implies that (strong) consistency is sufficient for bounded memory
78
Maarten Wiggers -- University of Twente 78 Requirement –Sink determines throughput for all transfer quanta Tasks are pushing data to sink –Different quanta imply different task execution rates –Tasks always need to be able to follow Buffer capacity –Should enable tasks to follow maximum required rate –Variation in quanta requires larger buffers Buffer capacities
79
Maarten Wiggers -- University of Twente 79 Buffer capacity (I)
80
Maarten Wiggers -- University of Twente 80 Buffer capacity (II)
81
Maarten Wiggers -- University of Twente 81 Minimise difference in start times Buffer capacity (III)
82
Maarten Wiggers -- University of Twente 82 Required buffer capacity Buffer capacity (IV)
83
Maarten Wiggers -- University of Twente 83 ACB β=1 General topology Minimum difference between start times of actors –Not a property of an edge –Determined by all paths
84
Maarten Wiggers -- University of Twente 84 ACB β=1 s=0s=2s=1 General topology Minimum difference between start times of actors –Not a property of an edge –Determined by all paths
85
Maarten Wiggers -- University of Twente 85 ACB β=1 s=0s=2s=1 22 General topology Minimum difference between start times of actors –Not a property of an edge –Determined by all paths
86
Maarten Wiggers -- University of Twente 86 Required buffer capacity Buffer capacity with β=1
87
Maarten Wiggers -- University of Twente 87 Required buffer capacity Buffer capacity with β=2
88
Maarten Wiggers -- University of Twente 88 General topology Minimum difference between start times of actors –Not a property of an edge –Determined by all paths Network flow problem –Constraints minimum differences per edge –Objective start times as close as possible together
89
Maarten Wiggers -- University of Twente 89 Outline Context –Streaming applications –Programming multiprocessor architectures Problem –Problem statement –Related work Variable Rate Dataflow –Chain topology –Arbitrary graph topology Experiment Conclusion
90
Maarten Wiggers -- University of Twente 90 H.263 decoder m is number of bytes read per picture n is number of blocks per picture Motion compensation needs to know how many blocks to read to assemble a picture
91
Maarten Wiggers -- University of Twente 91 Alternative implementation
92
Maarten Wiggers -- University of Twente 92 Buffer capacity Our implementation –Buffer capacity is in blocks Alternative implementation –buffer capacity is in frames
93
Maarten Wiggers -- University of Twente 93 Conclusion Trend : streaming applications are increasingly dynamic –Include tasks that have data-dependent execution rates –Implies run-time arbitration Variable Rate Dataflow –Production and consumption quanta can change in every execution –Can include effects of run-time arbitration –Efficient checks on execution in bounded memory Compute buffer capacities that guarantee satisfaction of a throughput constraint –Temporal monotonicity : token arrival times are conservative container arrival times –Temporal linearity : Δ later token arrival time cannot result in any token arrival time that is delayed by more than Δ
94
Maarten Wiggers -- University of Twente 94 Questions? m.h.wiggers@utwente.nl
95
Maarten Wiggers -- University of Twente 95 References [Bhattacharya – TSP 2001] B. Bhattacharya and S.S. Bhattacharyya. Parameterized Dataflow Modeling for DSP Systems. IEEE Transactions on Signal Processing. October 2001 [Buck – 1993] J. Buck. Scheduling Dynamic Dataflow Graphs with Bounded Memory using the Token Flow Model. PhD thesis, University of Berkeley. 1993 [Girault – TCAD 1999] A. Girault, B. Lee and E.A. Lee. Hierarchical Finite State Machines with Multiple Concurrency Models. IEEE Transactions on CAD. June 1999 [Hansson – TODAES 2008] A. Hansson, K.G.W. Goossens, M.J.G. Bekooij and J. Huisken. CoMPSoC: A Composable and Predictable Multi-Processor System on Chip Template. ACM Transactions on Design Automation of Electronic Systems. To appear [Lee – TC 1987] E.A. Lee and D. Messerschmitt. Static Scheduling of Synchronous Dataflow Programs for Digital Signal Processing. IEEE Transactions on Computers. January 1987 [Lee – TPDS 1991] E.A. Lee. Consistency in Dataflow Graphs. IEEE Transactions on Par. and Distr. Systems. 1991 [Lee – 1995] E.A. Lee and T. Parks. Dataflow Process Networks. Proc. of the IEEE. May 1995 [Sen – ASSP 2005] M. Sen, S.S. Bhattacharyya, T. Lv, and W. Wolf. Modeling Image Processing Systems with Homogeneous Parameterized Dataflow Graphs. In Proc. ASSP. March 2005
96
Maarten Wiggers -- University of Twente 96 References [Wiggers – RTAS 2007] M.H. Wiggers, M.J.G. Bekooij, P.G. Jansen and G.J.M. Smit. Efficient Computation of Buffer Capacities for Cyclo-Static Real-Time Systems with Back-Pressure. In Proc. RTAS. April 2007 [Wiggers – SCOPES 2007] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Modelling Run-Time Arbitration by Latency-Rate Servers in Dataflow Graphs. In Proc. SCOPES. April 2007 [Wiggers – DATE 2008] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Computation of Buffer Capacities for Throughput Constrained and Data-Dependent Inter-Task Communication. In Proc. DATE. April 2008 [Wiggers – RTAS 2008] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication. In Proc. RTAS. April 2008
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.