Multi-core Real-Time Scheduling for Generalized Parallel Task Models Abusayeed Saifullah, Kunal Agrawal, Chenyang Lu, Christopher Gill.

Slides:

Advertisements

Similar presentations

Real-Time Mutli-core Scheduling Moris Behnam. Introduction Single processor scheduling – E.g., t 1 (P=10,C=5), t 2 (10, 6) – U= >1 – Use a faster.

Advertisements

Bo Hong Electrical and Computer Engineering Department Drexel University

Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.

1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.

RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor Paul Regnier † George Lima † Ernesto Massa † Greg Levin ‡ Scott Brandt ‡

Real-Time Scheduling CIS700 Insup Lee October 3, 2005 CIS 700.

Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.

Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.

PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu

Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Graph Analysis with High Performance Computing by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008.

Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.

CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware

1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.

Multiscalar processors

1 Slice-Balancing H.264 Video Encoding for Improved Scalability of Multicore Decoding Michael Roitzsch Technische Universität Dresden ACM & IEEE international.

Recap Priorities task-level static job-level static dynamic Migration task-level fixed job-level fixed migratory Baker/ Oh (RTS98) Pfair scheduling This.

Embedded System Design Framework for Minimizing Code Size and Guaranteeing Real-Time Requirements Insik Shin, Insup Lee, & Sang Lyul Min CIS, Penn, USACSE,

Mapping Techniques for Load Balancing

Scheduling Parallel Task

End-to-End Delay Analysis for Fixed Priority Scheduling in WirelessHART Networks Abusayeed Saifullah, You Xu, Chenyang Lu, Yixin Chen.

Multiprocessor Real- Time Scheduling Aaron Harris CSE 666 Prof. Ganesan.

New Schedulability Tests for Real- Time task sets scheduled by Deadline Monotonic on Multiprocessors Marko Bertogna, Michele Cirinei, Giuseppe Lipari Scuola.

Programming with Shared Memory Introduction to OpenMP

CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.

Fair Scheduling of Real-Time Tasks on Multiprocessors Uday Prabhala.

Embedded System Design Framework for Minimizing Code Size and Guaranteeing Real-Time Requirements Insik Shin, Insup Lee, & Sang Lyul Min CIS, Penn, USACSE,

The Design of an EDF- Scheduled Resource-Sharing Open Environment Nathan Fisher Wayne State University Marko Bertogna Scuola Superiore Santa’Anna of Pisa.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.

Quantifying the Sub-optimality of Non-preemptive Real-time Scheduling Abhilash Thekkilakattil, Radu Dobrin and Sasikumar Punnekkat.

Efficient Admission Control for Enforcing Arbitrary Real-Time Demand-Curve Interfaces Farhana Dewan and Nathan Fisher RTSS, December 6 th, 2012 Sponsors:

1 Reducing Queue Lock Pessimism in Multiprocessor Schedulability Analysis Yang Chang, Robert Davis and Andy Wellings Real-time Systems Research Group University.

BFair: An Optimal Scheduler for Periodic Real-Time Tasks

Welcome!. PhD Dissertation Defense PhD Candidate: Wenming Li Advisor: Dr. Krishna M. Kavi Committee: Dr. Krishna M. Kavi Dr. Robert Akl Dr. Phil Sweany.

An Efficient Algorithm for Scheduling Instructions with Deadline Constraints on ILP Machines Wu Hui Joxan Jaffar School of Computing National University.

The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek Abdelzaher University of Illinois at Urbana Champaign 1.

Real-Time Scheduling CS 3204 – Operating Systems Lecture 20 3/3/2006 Shahrooz Feizabadi.

The Global Limited Preemptive Earliest Deadline First Feasibility of Sporadic Real-time Tasks Abhilash Thekkilakattil, Sanjoy Baruah, Radu Dobrin and Sasikumar.

Relaxing the Synchronous Approach for Mixed-Criticality Systems

Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.

The 32nd IEEE Real-Time Systems Symposium Meeting End-to-End Deadlines through Distributed Local Deadline Assignment Shengyan Hong, Thidapat Chantem, X.

6. Application mapping 6.1 Problem definition

Scheduling Real-Time tasks on Symmetric Multiprocessor Platforms Real-Time Systems Laboratory RETIS Lab Marko Bertogna Research Area: Multiprocessor Systems.

CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems RMS and EDF Schedulers.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Special Class on Real-Time Systems

CSE 522 Real-Time Scheduling (2)

Real-Time Scheduling CS 3204 – Operating Systems Lecture 13 10/3/2006 Shahrooz Feizabadi.

Real-Time Scheduling II: Compositional Scheduling Framework Insik Shin Dept. of Computer Science KAIST.

Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.

Rounding scheme if r * j  1 then r j := 1  When the number of processors assigned in the continuous solution is between 0 and 1 for each task, the speed.

Multiprocessor Fixed Priority Scheduling with Limited Preemptions Abhilash Thekkilakattil, Rob Davis, Radu Dobrin, Sasikumar Punnekkat and Marko Bertogna.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

Introductory Seminar on Research CIS5935 Fall 2008 Ted Baker.

Concurrency and Performance Based on slides by Henri Casanova.

Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.

THE DEADLINE-BASED SCHEDULING OF DIVISIBLE REAL-TIME WORKLOADS ON MULTIPROCESSOR PLATFORMS Suriayati Chuprat Supervisors: Professor Dr Shaharuddin Salleh.

Studying and Implementing Multi-processor based Real-time Scheduling Algorithms in Linux Musfiq Niaz Rahman

Improved Conditions for Bounded Tardiness under EPDF Fair Multiprocessor Scheduling UmaMaheswari Devi and Jim Anderson University of North Carolina at.

Distributed Process Scheduling- Real Time Scheduling Csc8320(Fall 2013)

Tardiness Bounds for Global EDF Scheduling on a Uniform Multiprocessor Kecheng Yang James H. Anderson Dept. of Computer Science UNC-Chapel Hill.

Embedded System Scheduling

Multiprocessor Real-Time Scheduling

Conception of parallel algorithms

CprE 458/558: Real-Time Systems

Distributed Systems CS

Ch 4. Periodic Task Scheduling

Presentation transcript:

Multi-core Real-Time Scheduling for Generalized Parallel Task Models Abusayeed Saifullah, Kunal Agrawal, Chenyang Lu, Christopher Gill

 Multi-core processors provide an opportunity to schedule computation-intensive tasks in real-time  Most of the tasks exhibit intra-task parallelism  Real-time systems need to be developed to exploit intra-task parallelism 2 Real-Time Systems on Multi-core  Traditional multiprocessor scheduling  Focuses on inter-task parallelism  Mostly restricted to sequential task models  Computation-intensive complex real-time tasks are growing  Video surveillance  Radar tracking  Hybrid real-time structural testing

3 Parallel Task Model  Lakshmanan et al. (RTSS ’10) have addressed a restricted synchronous model where Each horizontal bar indicates a thread of execution (sequence of instructions) Parallel threads form a segment Threads of each segment synchronize at the end of the segment  A task is an alternate sequence of parallel and sequential segments  The total number of threads in each segment ≤ number of cores  All parallel segments have an equal number of threads  Synchronous task model Segment 1 Seg 2 Seg 3 Segment 4 Segment 5 Threads of Segment 1 synchronize here

Our Contributions 4  We address a general synchronous parallel task model  Different segments may have different numbers of threads  Each segment can have an arbitrary number of threads  Example: such tasks are generated by  Parallel for loops in OpenMP, CilkPlus  Barrier primitives in thread libraries  This model is more portable  The same program can execute on machines with different numbers of cores

A Task Example start end 5 void parallel_task(float *a,float *b,float *c,float * d) { 7 int n=7; int i=0; parallel_for(; i< n; i++) c[i] = a[i] + b[i]; n=4; i=0; parallel_for(; i< n; i++) d[i] = a[i] - b[i]; }

Our Contributions (contd..) 6  We propose a task decomposition for general synchronous parallel task model  Decomposes each parallel task into a set of sequential subtasks  Subtasks are scheduled like traditional tasks  Why decomposition?  We can exploit the rich literature of multiprocessor scheduling  The proposed decomposition ensures that if the decomposed tasks are schedulable, the original task set is also schedulable

Our Contributions (contd..)  We analyze schedulability in terms of processor speed augmentation bound  Speed augmentation bound ν for an Algorithm A: if an optimal algorithm can schedule a synchronous parallel task set on unit- speed processor cores, then A can schedule the decomposed tasks on ν-speed processor cores.  We prove that the proposed decomposition requires a speed augmentation of at most  4 for Global Earliest Deadline First (G-EDF) scheduling  5 for Partitioned Deadline Monotonic (P-DM) scheduling 7

Overview of a Task Decomposition 8  Each thread of the task becomes an individual task with  An intermediate subdeadline  A release offset to retain precedence relations in the original task  Deadlines are assigned by distributing slack among segments  Deadline of a thread= execution requirement+ assigned slack

 How much slack a segment demands depends on  Available slack of the task  Execution requirement of the segment  Execution requirement of a segment is the product of  Total number of parallel threads in the segment and  Execution requirement of each thread in the segment  Larger execution requirement implies more demand for slack  In the figure, Segment 1 requires more slack than Segment 2 Slack Distribution 9

Slack Distribution (contd..) 10  We use the following principle to distribute slack  All segments that receive slack will achieve an equal density  Reasons to equalize the density among segments  Fairness: deadline of each segment becomes proportional to its execution requirement  We can bound the density of the decomposed tasks  We can exploit existing density-based analyses for multiprocessor

Slack Distribution (contd..) 11 …  Slack of each segment is determined by solving the equalities  Sum of subdeadlines=task deadline (total assigned slack = task slack)  Density of Segment 1= density of Segment 2 = so on  All threads in a segment have the same deadline and offset  Deadline= execution requirement of the thread + segment slack  Release offset=sum of deadlines of preceding segment

An Example of Task Decomposition 12 Segment 1: deadline=20 density= (5*4)/20=1 Segment 2: deadline=4 density= (2*2)/4=1 Segment 3: deadline=9 density= (3*3)/9=1 Segment 4: deadline=16 density= (4*4)/16=1 Segment 5: deadline=3 density= (1*3)/3=1 All segments have an equal density!

Global EDF (G-EDF) Schedulability  A sufficient condition for G-EDF scheduling on m unit- speed cores [Baruah RTSS ’07]  A necessary condition for any task set for any scheduler total density max density If the original task set is schedulable anyway on m unit-speed cores, the decomposed tasks are schedulable under G-EDF on 4-speed cores Using the density bounds for decomposed tasks 13

Partitioned DM (P-DM) Schedulability  A sufficient condition for FBB-FFD scheduling on m unit-speed cores FBB-FFD (Fisher Baruah Baker – First-Fit Decreasing) is a well-known P-DM scheduler [ECRTS ’06]  A necessary condition for any scheduler max cumulative exe. req. of tasks divided by time length If the original task set is schedulable anyway on m unit-speed cores, the decomposed tasks are FBB-FFD schedulable on 5-speed cores Using load and density bounds for decomposed tasks 14

Conclusion  Multi-core processors provide opportunities to schedule computation-intensive tasks in real-time  Real-time systems need to exploit intra-task parallelism  We have addressed real-time scheduling for generalized synchronous parallel task model  Different segments may have different number of threads  Each segment can have an arbitrary number of threads  We have proposed a task decomposition that achieves  A processor-speed augmentation bound of 4 for Global EDF  A processor-speed augmentation bound of 5 for Partitioned DM 15