1 Lic Presentation Memory Aware Task Assignment and Scheduling for Multiprocessor Embedded Systems Radoslaw Szymanek / Embedded System Design
2 Outline Introduction Problem Formulation and Motivational Example CLP Introduction CLP Modeling Optimization Heuristic and Experimental Results Conclusions
3 System Level Synthesis (SLS) Multiprocessor embedded systems are designed using CPU’s, ASIC’s, buses, and interconnection links The application areas range from signal and image processing to multimedia and telecommunication Task graph representation for application The main design activities are task assignment and scheduling for a given architecture Memory constraints (code and data memory)
4 SLS with memory constraints annotated task graph target architecture P1 ROMRAM P2 ROMRAM P3 ROM RAM A1 RAM B1 L1 L2
5 Data dominated application represented as directed bipartite acyclic task graph Each task is annotated with execution time, code and data memory requirements Heterogeneous architecture Both tasks and communications are atomic and they must be performed in one step Find a good CLP model Find a good heuristic for memory constrained time minimization task assignment and scheduling satisfying all constraints Problem Assumptions and Formulation
6 Motivation SoC multiprocessor architectures Co-design methodology needs tool support Memory consideration to decrease cost and power consumption System Level design for fast evaluation
7 Motivating example (memory) task graph D C2 t D C3 t architecture P1 L1 C1C1 C3C3 C2C2 C2C2 P1 L1 P2 C3C3 P1 L1 P2 D C3 t t D C2 D C3 P1 P2 P1P2 D C1 Schedule Data Memory Task - 1kB code memory, 4kB data memory, Communication - 2kB data memory P2
8 CLP Introduction “Constraint programming represents one of the closest approaches computer science has yet made to the Holy Grail of programming: the user states the problem, the computer solves it.” Eugene C. Freuder CONSTRAINTS, April 1997
9 CLP Introduction Relatively young and attractive approach for modeling many types of optimization problems Many heterogeneous applications of constraints programming exist today State decision variables which constitute to solution State constraints which must be satisfied by solution Search for solutions using knowledge you can derive from constraints
10 Constraints properties may specify partial information — need not uniquely specify the values of its variables, non-directional — typically one can infer a constraint on each present variable, declarative — specify relationship, not a procedure to enforce this relationship, additive — order of imposing constraints does not matter, rarely independent — typically they share variables.
11 A simple constraint problem 1. Specify all decision variables and their initial domains CLP description TP1, TP2, TP3 :: 1..2, TS1, TS2, TS3 :: 0..10, Cost :: 0..10, Natural language description There are three tasks, namely, T1, T2, and T3. Each of these tasks can execute on any of two available processors, P1 and P2. Tasks T1 and T2 send data to task T3. The tasks should be assigned and scheduled in such a way that the schedule length does not exceed 10 seconds.
12 A simple constraint problem 2. Specify all constraints and additional variables The execution time of task T1 is four seconds on processor P1 and two seconds on processor P2. Task T2 requires three and five seconds to complete execution on processor P1 and P2 respectively. Task T3 always needs three seconds for execution. If TP1 = 1 then TD1 = 4. If TP1 = 2 then TD1 = 2, If TP2 = 1 then TD2 = 3, If TP2 = 2 then TD2 = 5, TD3 = 3,
13 A simple constraint problem Tasks T1 and T2 must execute on different processors. Tasks T1 and T2 send data to task T3. If two communicating tasks are executed on different processors there must be at least one second delay between them so the data can be transferred. The tasks should be assigned and scheduled in such a way that the schedule length does not exceed 10 seconds. TP1 != TP2, If TP1 != TP3 then D1 = 1 else D1 = 0, TS1 + TD1 + D1 <= TS3, […], Cost >= TS1 + TD1, Cost >= TS2 + TD2, Cost >= TS3 + TD3.
14 Search Tree
15 Modeling Constraint Logic Programming (finite domain, CHIP solver) Global constraints (cumulative, diffn, sequence, etc.) reduce model complexity of the synthesis problem and exploit specific features of the problem Global constraints are useful for modeling placement problems and graph problems Problem-specific search heuristic for NP-hard problem
16 CLP Model Decision variables for task TS – start time of the task execution TP – resource on which task is executed TDP – exact placement of task local data in memory Additional variables for task TD – task duration TCM and TDM denote the amount of code and data memory for task execution
17 CLP Model Decision variables for data DS – start time of the data communication DB – resource on which data is communicated DCP and DPP – exact placement of data in memory of the producer and consumer processor Additional variables for data DD – data communication duration
18 CLP Model – Task Requirements TS TP TD 1 time PU a) execution time TP 1 TCM PU CM b) code memory TS TDP TD TDM DM c) data memory time
19 CLP Model – Data Requirements DS DB DD 1 CU communication time time TS p DPP DA DM data mem (prod) time DS+DD DS DCP DA DM data mem (cons) time TS c + TD c
20 Simple Example P2 P1 P2 P1 B1 T2 C1 T3 T1 D1_e D1_p D2_pD2_c D3_e D2_e D1_c T1 D1 D2 T2 T3 Diffn constraint
21 Code Memory Constraint Processor Code Memory Limit T5 T2 T7 T6 T3 T4 T8 T1
22 Constraints types precedence constraints processing resources constraints communication resource constraints pipelining constraints code memory constraints data memory constraints
23 data memory estimate no. 1 holds? Task Assignment and Scheduling Heuristic Undo all decision – choose a task which consumes the most data data memory estimate no. 2 holds? Assign data memory Schedule communications that T i is minimal Assign the task to a processor with the minimal implementaion cost c i Choose a task from ready task set with min(max(T i )) – minimize schedule length Y Y N N
24 Execution Cost Ind = LowTS/PTS – LowCM/PCM ATS = available time slots, ACM – available code memory i-th task, n-th processor
25 Data and Communication Cost i-th task, n-th processor
26 Estimate no. 1 where S (S n ) is a set of tasks already scheduled on a processor (processor P n ), tasks t j are direct successors of task t i, and d ij is amount of data communicated between t i and t j. Estimate no. 2 uses the global constraint diffn and it takes time into account Estimates
27 MATAS System
28 Synthesis Results - H.261 example DCT FB1 IN BMA FIR PRAE Q IQ IDCT REK FB2 C Video Coding Algorithm H.261
29 Experimental results H.261 example
30 Experimental results (random task graphs)
31 Main Contributions Definition of the extended task assignment and scheduling problem Inclusion of memory constraints to decrease the cost for data dominated applications Specialized search heuristic to solve resource constrained task assignment and scheduling CLP modeling framework to facilitate an efficient, clean, and readable problem definition
32 Conclusions and Future Work The synthesis problem modeled as a constraint satisfaction problem and solved by the proposed heuristic, Good coupling between model and search method for efficient search space pruning, Memory constraints and pipelined designs taken into account, Heterogeneous constraints can be modeled in CLP, important advantage over other approaches Need for our own constraint engine implementation, approximate solutions, mixture of techniques Need for better lower bounds, problem specific global constraints, designer interaction during search
33 Lic Presentation Memory Aware Task Assignment and Scheduling for Multiprocessor Embedded Systems Radoslaw Szymanek / Embedded System Design
34 Related Work J. Madsen, P. Bjorn-Jorgensen, “Embedded System Synthesis under Memory Constraints”, CODES ‘99 (GA, only RAM) S. Prakash and A. Parker, “Synthesis of Application- Specific Heterogeneous Multiprocessor Systems”, VLSI Signal Processing, ‘94 (MILP, no ASIC’s, optimal)
35 A simple constraint problem There are three tasks, namely, T1, T2, and T3. Each of these tasks can execute on any of two available processors, P1 and P2. Tasks T1 and T2 send data to task T3. Tasks T1 and T2 must execute on different processors due to some fault tolerant issues. The execution time of task T1 is four seconds on processor P1 and two seconds on processor P2. Task T2 requires three and five seconds to complete execution on processor P1 and P2 respectively. Task T3 always needs three seconds for execution. In case when two communicating tasks are executed on different processors there must be one second delay between them so the data can be transferred. The tasks should be assigned and scheduled in such a way that the schedule length does not exceed 10 seconds.