Multi-core Real-Time Scheduling for Generalized Parallel Task Models Abusayeed Saifullah, Kunal Agrawal, Chenyang Lu, Christopher Gill
Multi-core processors provide an opportunity to schedule computation-intensive tasks in real-time Most of the tasks exhibit intra-task parallelism Real-time systems need to be developed to exploit intra-task parallelism 2 Real-Time Systems on Multi-core Traditional multiprocessor scheduling Focuses on inter-task parallelism Mostly restricted to sequential task models Computation-intensive complex real-time tasks are growing Video surveillance Radar tracking Hybrid real-time structural testing
3 Parallel Task Model Lakshmanan et al. (RTSS ’10) have addressed a restricted synchronous model where Each horizontal bar indicates a thread of execution (sequence of instructions) Parallel threads form a segment Threads of each segment synchronize at the end of the segment A task is an alternate sequence of parallel and sequential segments The total number of threads in each segment ≤ number of cores All parallel segments have an equal number of threads Synchronous task model Segment 1 Seg 2 Seg 3 Segment 4 Segment 5 Threads of Segment 1 synchronize here
Our Contributions 4 We address a general synchronous parallel task model Different segments may have different numbers of threads Each segment can have an arbitrary number of threads Example: such tasks are generated by Parallel for loops in OpenMP, CilkPlus Barrier primitives in thread libraries This model is more portable The same program can execute on machines with different numbers of cores
A Task Example start end 5 void parallel_task(float *a,float *b,float *c,float * d) { 7 int n=7; int i=0; parallel_for(; i< n; i++) c[i] = a[i] + b[i]; n=4; i=0; parallel_for(; i< n; i++) d[i] = a[i] - b[i]; }
Our Contributions (contd..) 6 We propose a task decomposition for general synchronous parallel task model Decomposes each parallel task into a set of sequential subtasks Subtasks are scheduled like traditional tasks Why decomposition? We can exploit the rich literature of multiprocessor scheduling The proposed decomposition ensures that if the decomposed tasks are schedulable, the original task set is also schedulable
Our Contributions (contd..) We analyze schedulability in terms of processor speed augmentation bound Speed augmentation bound ν for an Algorithm A: if an optimal algorithm can schedule a synchronous parallel task set on unit- speed processor cores, then A can schedule the decomposed tasks on ν-speed processor cores. We prove that the proposed decomposition requires a speed augmentation of at most 4 for Global Earliest Deadline First (G-EDF) scheduling 5 for Partitioned Deadline Monotonic (P-DM) scheduling 7
Overview of a Task Decomposition 8 Each thread of the task becomes an individual task with An intermediate subdeadline A release offset to retain precedence relations in the original task Deadlines are assigned by distributing slack among segments Deadline of a thread= execution requirement+ assigned slack
How much slack a segment demands depends on Available slack of the task Execution requirement of the segment Execution requirement of a segment is the product of Total number of parallel threads in the segment and Execution requirement of each thread in the segment Larger execution requirement implies more demand for slack In the figure, Segment 1 requires more slack than Segment 2 Slack Distribution 9
Slack Distribution (contd..) 10 We use the following principle to distribute slack All segments that receive slack will achieve an equal density Reasons to equalize the density among segments Fairness: deadline of each segment becomes proportional to its execution requirement We can bound the density of the decomposed tasks We can exploit existing density-based analyses for multiprocessor
Slack Distribution (contd..) 11 … Slack of each segment is determined by solving the equalities Sum of subdeadlines=task deadline (total assigned slack = task slack) Density of Segment 1= density of Segment 2 = so on All threads in a segment have the same deadline and offset Deadline= execution requirement of the thread + segment slack Release offset=sum of deadlines of preceding segment
An Example of Task Decomposition 12 Segment 1: deadline=20 density= (5*4)/20=1 Segment 2: deadline=4 density= (2*2)/4=1 Segment 3: deadline=9 density= (3*3)/9=1 Segment 4: deadline=16 density= (4*4)/16=1 Segment 5: deadline=3 density= (1*3)/3=1 All segments have an equal density!
Global EDF (G-EDF) Schedulability A sufficient condition for G-EDF scheduling on m unit- speed cores [Baruah RTSS ’07] A necessary condition for any task set for any scheduler total density max density If the original task set is schedulable anyway on m unit-speed cores, the decomposed tasks are schedulable under G-EDF on 4-speed cores Using the density bounds for decomposed tasks 13
Partitioned DM (P-DM) Schedulability A sufficient condition for FBB-FFD scheduling on m unit-speed cores FBB-FFD (Fisher Baruah Baker – First-Fit Decreasing) is a well-known P-DM scheduler [ECRTS ’06] A necessary condition for any scheduler max cumulative exe. req. of tasks divided by time length If the original task set is schedulable anyway on m unit-speed cores, the decomposed tasks are FBB-FFD schedulable on 5-speed cores Using load and density bounds for decomposed tasks 14
Conclusion Multi-core processors provide opportunities to schedule computation-intensive tasks in real-time Real-time systems need to exploit intra-task parallelism We have addressed real-time scheduling for generalized synchronous parallel task model Different segments may have different number of threads Each segment can have an arbitrary number of threads We have proposed a task decomposition that achieves A processor-speed augmentation bound of 4 for Global EDF A processor-speed augmentation bound of 5 for Partitioned DM 15