Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real-Time Scheduling Analysis for Multiprocessor Platforms Marko Bertogna PhD dissertation Scuola Superiore S.Anna, Pisa, Italy.

Similar presentations


Presentation on theme: "Real-Time Scheduling Analysis for Multiprocessor Platforms Marko Bertogna PhD dissertation Scuola Superiore S.Anna, Pisa, Italy."— Presentation transcript:

1 Real-Time Scheduling Analysis for Multiprocessor Platforms Marko Bertogna PhD dissertation Scuola Superiore S.Anna, Pisa, Italy

2 19/05/2008Marko Bertogna - PhD dissertation2 Overview The Multicore Revolution Real-Time Multiprocessor Systems: existing results Schedulability Analysis for global schedulers Experimental evaluation Conclusions Other research activities

3 19/05/2008Marko Bertogna - PhD dissertation3 Main Contributions Systematization of existing results for RT scheduling and schedulability analysis on MP Polynomial and pseudo-polynomial schedulability tests for Work-conserving schedulers FP EDF EDZL Experimental comparison of existing techniques

4 19/05/2008Marko Bertogna - PhD dissertation4 Real-Time Systems Solid theory of single processor systems Optimal schedulers, tight schedulability tests, shared resource protocols, bandwidth reservation schemes, hierarchical schedulers, OS, etc. Much less results for multiprocessors Many NP-hard problems, few optimal results, heuristic approaches, simplified task models, only sufficient schedulability tests, etc. Do we really need to investigate Multi- Processors Real-Time Systems?

5 19/05/2008Marko Bertogna - PhD dissertation5 As Moore ’ s law goes on … Number of transistor/chip doubles every 18 to 24 mm months

6 19/05/2008Marko Bertogna - PhD dissertation6 … heating becomes a problem 4004 8008 8080 8085 8086 286 386 486 Pentium P1 P2 P4 Pentium Tejas cancelled! P3 Hot-plate Nuclear Reactor STOP Year Power (W) P  V  f: Clock speed limited to less than 4 GHz

7 19/05/2008Marko Bertogna - PhD dissertation7 Solution Denser chips with transistor operating at lower frequencies MULTICORE SYSTEMS Use a higher number of slower logic gates

8 19/05/2008Marko Bertogna - PhD dissertation8 The Multicore invasion Intel ’ s Core2, Itanium, Xeon: 2, 4 cores AMD ’ s Opteron, Athlon 64 X2, Phenom: 2, 4 cores IBM-Toshiba-Sony Cell processor: 8 cores (PSX3) Microsoft ’ s Xenon: 3 cores (Xbox 360) ARM ’ s MPCore: 4 cores Sun ’ s Niagara UltraSPARC: 8 cores Tilera ’ s TILE64: 64-core Nios II: x soft Cores TI, Freescale, Atmel, Broadcom,Picochip (picoArray up to 300 DSP cores),...

9 19/05/2008Marko Bertogna - PhD dissertation9 Identical vs heterogenous cores ARM’s MPCoreSTI’s Cell Processor 4 identical ARMv6 cores One Power Processor Element (PPE) 8 Synergistic Processing Element (SPE)

10 19/05/2008Marko Bertogna - PhD dissertation10 System model Platform with m identical processors Task set  with n periodic or sporadic tasks  i Period or minimum inter-arrival time T i Worst-case execution time C i Deadline D i Utilization U i =C i /T i, density i =C i /min(D i,T i )

11 19/05/2008Marko Bertogna - PhD dissertation11 CPU1 CPU2 Problems addressed Run-time scheduling problem Schedulability problem 11 22 33 44 55 ? CPU3

12 19/05/2008Marko Bertogna - PhD dissertation12 Assumptions Independent tasks Job-level parallelism prohibited the same job cannot be contemporarily executed on more than one processor Preemption and Migration support a preempted task can resume its execution on a different processor Cost of preemption/migration integrated into task WCET

13 19/05/2008Marko Bertogna - PhD dissertation13 Single system-wide queue or multiple per- processor queues: CPU1 CPU2 CPU3         Global vs partitioned scheduling CPU1 CPU2 CPU3        Global schedulerPartitioned scheduler

14 19/05/2008Marko Bertogna - PhD dissertation14 Partitioned Scheduling The scheduling problem reduces to: Global (work-conserving) and partitioned approaches are incomparable Bin-packing problem Uniprocessor scheduling problem + NP-hard in the strong sense Various heuristics used: FF, NF, BF, FFDU, BFDD, etc. Well known EDF U tot ≤ 1      RM (RTA)...

15 19/05/2008Marko Bertogna - PhD dissertation15 Global scheduling The m highest priority ready jobs are always the one executing Work-conserving scheduler No processor is ever idled when a task is ready to execute. CPU1 CPU2 CPU3        

16 19/05/2008Marko Bertogna - PhD dissertation16 Global scheduling: advantages Load automatically balanced Easier re-scheduling (dynamic loads, selective shutdown, etc.) Lower average response time (see queueing theory) More efficient reclaiming and overload management Number of preemptions Migration cost: can be mitigated by proper HW (e.g., MPCore ’ s Direct Data Intervention) Few schedulability tests  Further research needed

17 19/05/2008Marko Bertogna - PhD dissertation17 Uniprocessor scheduling EDF optimal for arbitrary job collections Exact schedulability conditions linear test for implicit deadlines: U tot ≤ 1 Pseudo-polynomial test for constrained and arbitrary deadlines [Baruah et al. 90] Optimal priority assignments for sporadic and synchronous periodic task systems RM for implicit deadlines DM for constrained deadlines Exact pseudo-polynomial schedulability test for FP Response Time Analysis (RTA)

18 19/05/2008Marko Bertogna - PhD dissertation18 Global Scheduling No optimal scheduler known for general task models Pfair optimal for implicit deadlines: U tot ≤ m preemption and synchronization issues Classic schedulers are not optimal (Dhall ’ s effect): Hybrid schedulers: EDF-US, RM-US, DM-DS, AdaptiveTkC, fpEDF, EDF (k), EDZL, … m light tasks 1 heavy task U tot  1

19 19/05/2008Marko Bertogna - PhD dissertation19 Global scheduling: main results Only sufficient schedulability tests Utilization-based tests (implicit deadlines) EDF  Goossens et al.: U tot ≤ m(1-U max )+U max fpEDF  Baruah: U tot ≤ (m+1)/2 RM-US  Andersson et al.: U tot ≤ m 2 /(3m-2) Polynomial tests EDF, FP  Baker: O(n 2 ) and O(n 3 ) tests EDZL  Cirinei,Baker: O(n 2 ) test Pseudo-polynomial tests EDF, FP  Fisher,Baruah: load-based tests

20 19/05/2008Marko Bertogna - PhD dissertation20 Density-based tests EDF: tot ≤ m(1- max )+ max EDF-DS[1/2]: tot ≤ (m+1)/2 DM: tot ≤ m(1 – max )/2+ max DM-DS[1/3]: tot ≤ (m+1)/3 [ECRTS’05] [OPODIS’05] Gives highest priority to (at most m-1) tasks having t ≥ 1/2, and schedules the remaining ones with EDF Gives highest priority to (at most m-1) tasks having t ≥ 1/3, and schedules the remaining ones with DM (only constrained deadlines)

21 19/05/2008Marko Bertogna - PhD dissertation21 Critical instant A particular configuration of releases that leads to the largest possible response time of a task. Possible to derive exact schedulability tests analyzing just the critical instant situation. Uniprocessor FP and EDF: a critical instant is when all tasks arrive synchronously all jobs are released as soon as permitted Response Time Analysis for uniprocessors FP  the response time of task k is given by the fixed point of R k in the iteration

22 19/05/2008Marko Bertogna - PhD dissertation22 Multiprocessor anomaly Synchronous periodic arrival of jobs is not a critical instant for multiprocessors:  1 = (1,1,2)  2 = (1,1,3)  3 = (5,6,6) Synchronous periodic situation Second job of  2 delayed by one unit from [Bar07] Need to find pessimistic situations to derive sufficient schedulability tests

23 19/05/2008Marko Bertogna - PhD dissertation23 Introducing the interference I k = Total interference suffered by task  k I k i = Interference of task  i on task  k kk kk kk CPU1 CPU2 CPU3 rkrk r k +R k Ik2Ik2 Ik1Ik1 Ik2Ik2 Ik3Ik3 Ik4Ik4 Ik5Ik5 Ik6Ik6 Ik8Ik8 Ik5Ik5 Ik3Ik3 Ik7Ik7 Ik3Ik3

24 19/05/2008Marko Bertogna - PhD dissertation24 Limiting the interference kk kk kk CPU1 CPU2 CPU3 rkrk r k +R k Ik2Ik2 Ik1Ik1 Ik2Ik2 Ik3Ik3 Ik4Ik4 Ik5Ik5 Ik6Ik6 Ik8Ik8 Ik5Ik5 Ik3Ik3 Ik7Ik7 Ik3Ik3 It is sufficient to consider at most the portion (R k -C k +1) of each term I i k in the sum It can be proved that WCRT k is given by the fixed point of:

25 19/05/2008Marko Bertogna - PhD dissertation25 Bounding the interference Exactly computing the interference is complex Pessimistic assumptions: 1. Bound the interference of a task with the workload: 2. Use an upper bound on the workload.

26 19/05/2008Marko Bertogna - PhD dissertation26 Bounding the workload Consider a situation in which: The first job executes as close as possible to its deadline Successive jobs execute as soon as possible where: CiCi L DiDi CiCi CiCi CiCi TiTi εiεi (# jobs excluded the last one) (last job)

27 19/05/2008Marko Bertogna - PhD dissertation27 RTA for generic global schedulers An upper bound on the WCRT of task k is given by the fixed point of R k in the iteration: The slack of task k is at least: RkRk SkSk

28 19/05/2008Marko Bertogna - PhD dissertation28 Improvement using slack values Consider a situation in which: The first job executes as close as possible to its deadline Successive jobs execute as soon as possible where: CiCi L DiDi CiCi CiCi CiCi TiTi εiεi (# jobs excluded the last one) (last job)

29 19/05/2008Marko Bertogna - PhD dissertation29 Improvement using slack values Consider a situation in which: The first job executes as close as possible to its deadline Successive jobs execute as soon as possible where: CiCi L DiDi CiCi CiCi CiCi TiTi SiSi

30 19/05/2008Marko Bertogna - PhD dissertation30 RTA for generic global schedulers An upper bound on the WCRT of task k is given by the fixed point of R k in the iteration: 1. 2. If a fixed point R k ≤ D k is reached for every task k in the system, the task set is schedulable with any work- conserving global scheduler.

31 19/05/2008Marko Bertogna - PhD dissertation31 Iterative schedulability test 1. All slacks initialized to zero 2. Compute slack lower bound for tasks 1, …,n if higher than old value  update slack bound If lower, do nothing 3. If all tasks have a positive slack lower bound  return success 4. If no slack has been updated for tasks 1, …,n  return fail 5. Otherwise, return to point 2

32 19/05/2008Marko Bertogna - PhD dissertation32 RTA refinement for Fixed Priority The interference on higher priority tasks is always null: An upper bound on the WCRT of task k can be given by the fixed point of R k in the iteration: 2. 1.

33 19/05/2008Marko Bertogna - PhD dissertation33 RTA refinement for EDF A different bound can be derived analyzing the worst-case workload in a situation in which: The interfering and interfered tasks have a common deadline All jobs execute as late as possible An upper bound on the WCRT of task k is given by the fixed point of R k in the iteration: 2. 1.

34 19/05/2008Marko Bertogna - PhD dissertation34 Complexity Pseudo-polynomial complexity Fast average behavior We verified the schedulability of millions of task sets in a few minutes on a normal device. Lower complexity for Fixed Priority systems at most one slack update per task, if slacks are updated in decreasing priority order. Possible to reduce complexity limiting the number of rounds

35 19/05/2008Marko Bertogna - PhD dissertation35 Polynomial complexity test A simpler test can be derived avoiding the iterations on the response times A lower bound on the slack of  k is given by: The iteration on the slack values is the same Performances comparable to RTA-based test Complexity down to O(n 2 )

36 19/05/2008Marko Bertogna - PhD dissertation36 Experimental results for EDF 2 processors Constrained deadlines 1.000.000 task sets generated Our test is constantly superior at all utilizations generated task sets our test Improvement over existing solutions Task set utilization task sets Bertogna et al.’05 Baker et al.’07 Goossens et al.’03 I-BCL EDF Total task sets

37 19/05/2008Marko Bertogna - PhD dissertation37 Experimental results for FP 2 processors Constrained deadlines 1.000.000 task sets generated Our test is constantly superior at all utilizations generated task sets our test task sets Density bound Baker et al.’07 Bertogna et al.’05 I-BCL FP Total task sets Task set utilization

38 19/05/2008Marko Bertogna - PhD dissertation38 FP vs EDF 4 processors Constrained deadlines 1.000.000 task sets generated our FP test is constantly superior to all tests at every utilization generated task sets our FP test task sets our EDF test Goossens et al.’03 I-BCL EDF Baker et al.’07 I-BCL FP Total task sets Task set utilization

39 19/05/2008Marko Bertogna - PhD dissertation39 Conclusions Multiprocessor Real-Time systems are a promising field to explore. Still few existing results far from tight conditions. We contributed filling this gap. Future work: Find tighter schedulability tests. Use our techniques to analyze the efficiency of other scheduling algorithms (EDZL, EDF-US, FP-DS, etc). Take into account exclusive resources access. Integrate into Resource Reservation framework.

40 19/05/2008Marko Bertogna - PhD dissertation40 The end

41 19/05/2008Marko Bertogna - PhD dissertation41 Other research activities Limited-preemption EDF Reducing Resource Holding Times Shared resources and open environments

42 19/05/2008Marko Bertogna - PhD dissertation42 ARM ’ s MPcore

43 19/05/2008Marko Bertogna - PhD dissertation43 Frequency and power f = operating frequency V = supply voltage(V~=0.3+0.7 f) Reducing the voltage causes a higher frequency reduction I leak = leakage current (becomes non-negligible) P = P dynamic + P static = power consumed P dynamic  ACV 2 f (main contributor until hundreds nm) P static  VI leak (always present, due to subthreshold and gate-oxide leakage) Reducing V allows a quadratic reduction of P dynamic

44 19/05/2008Marko Bertogna - PhD dissertation44 Power density 4004 8008 8080 8085 8086 286 386 486 Pentium® proc P6 1 10 100 1000 10000 19701980199020002010 Year Power Density (W/cm2) Hot Plate Nuclear Reactor Rocke t Nozzle

45 19/05/2008Marko Bertogna - PhD dissertation45 How many cores in the future? Intel ’ s 80 core prototype already available Able to transfers a TB of data/s (Core 2 Duo reaches 1.66GB data/s) To be released in 5 years

46 19/05/2008Marko Bertogna - PhD dissertation46 Beyond 2 billion transistors/chip Intel ’ s Tukwila Itanium based 2.046 B FET Quad-core 65 nm technology 2 GHz on 170W 30 MB cache 2 SMT  8 threads/ck

47 19/05/2008Marko Bertogna - PhD dissertation47 Intel ’ s timeline

48 19/05/2008Marko Bertogna - PhD dissertation48 From 4004 (1971) to Pentium D (2005): Tech:10 um  65 nm : 150 x f: 100kHz  3 GHz: 25000 x # MOS: 2.300  291.000.000:125.000 x P: 0.2W  100W: 500 x Vdd reduced (from 5V to ~1V) Not all MOS change state Great part of chip occupied by cache f  Vdd-Vtt I leak  Vdd, 1/Vtt

49 19/05/2008Marko Bertogna - PhD dissertation49 Intel 4004 (1971) Intel Pentium IV (2000)

50 19/05/2008Marko Bertogna - PhD dissertation50 Itanium temperature plot

51 19/05/2008Marko Bertogna - PhD dissertation51 CPU1 CPU2 CPU3 Problems addressed Run-time scheduling problem Schedulability problem 11 22 33 44 55 ?

52 19/05/2008Marko Bertogna - PhD dissertation52 Incandescent light bulb: 25-100 W Compact fluorescent lights: 5-30 W Typical car: 25 kW Human climbing stairs: 200 W 1 kWh = 1 kW constantly supplied for 1 h ENEL: 0.13-0.18 € /kWh

53 19/05/2008Marko Bertogna - PhD dissertation53 Density and utilization bounds

54 19/05/2008Marko Bertogna - PhD dissertation54 Uniprocessor feasibility Deadline model Task model ImplicitConstrained or Arbitrary Sporadic or Synchronous Periodic Linear test: U tot ≤ 1 Unknown complexity; Pseudo-polynomial test if U tot < 1: EDF until U tot /(1- U tot ) · max(T i -D i ) Asynchronous Periodic Linear test: U tot ≤ 1 Strong NP-hard; Exponential test: EDF until 2H+D max +r max

55 19/05/2008Marko Bertogna - PhD dissertation55 Uniprocessor static priority run- time scheduling Deadline model Task model ImplicitConstrainedArbitrary Sporadic or Synchronous Periodic RM optimality DM optimality Unknown complexity; Audsley ’ s bottom-up algorithm (exponential complexity) Asynchronous Periodic Unknown complexity; Audsley ’ s bottom-up algorithm (exponential complexity)

56 19/05/2008Marko Bertogna - PhD dissertation56 Uniprocessor static priority feasibility Deadline model Task model ImplicitConstrainedArbitrary Sporadic or Synchronous Periodic Pseudo-polynomial test: RM until T max or RTA Pseudo-polynomial test: DM until D max or RTA Unknown complexity; Audsley ’ s bottom-up algorithm (exponential) Asynchronous Periodic Unknown complexity Strong NP-hard Audsley ’ s bottom-up algorithm (exponential)

57 19/05/2008Marko Bertogna - PhD dissertation57 Uniprocessor static priority schedulability Deadline model Task model ImplicitConstrainedArbitrary Sporadic or Synchronous Periodic Pseudo-polynomial simulation until T max or RTA Pseudo-polynomial simulation until D max or RTA Unknown complexity; Lehoczky ’ s test (exponential) Asynchronous Periodic Strong NP-hard; Simulation until 2H+r max or other exponential tests

58 19/05/2008Marko Bertogna - PhD dissertation58 Multiprocessor feasibility Deadline model Task model ImplicitConstrainedArbitrary Sporadic Linear test: U tot ≤ m Unknown complexity; Synchronous periodic not a critical instant Synchronous Periodic Unknown complexity; Horn ’ s algorithm in (0,H] Unknown complexity Asynchronous PeriodicStrong NP-hard

59 19/05/2008Marko Bertogna - PhD dissertation59 Multiprocessor run-time scheduling Deadline model Task model ImplicitConstrainedArbitrary SporadicP-fair, GPSRequires clairvoyance Synchronous Periodic P-fair, GPS, LLREF, EKG, BF Unknown complexity; Clairvoyance not needed; Horn ’ s algorithm in (0,H] Unknown complexity; Clairvoyance not needed Asynchronous Periodic Unknown complexity; Clairvoyance not needed

60 19/05/2008Marko Bertogna - PhD dissertation60 Feasibility conditions Σ i C i /min(D i,T i ) ≤ m load > m load * > m U tot > m Sufficient feasibility and schedulability tests ??? Not feasible Feasible

61 19/05/2008Marko Bertogna - PhD dissertation61 Multiprocessor static job priority feasibility Deadline model Task model ImplicitConstrainedArbitrary SporadicUnknown complexity Unknown complexity; Synchronous periodic not a critical instant Synchronous Periodic Unknown complexity; Simulation until hyperperiod for all N! job priority assignments Unknown complexity Asynchronous Periodic Unknown complexityStrong NP-hard

62 19/05/2008Marko Bertogna - PhD dissertation62 Multiprocessor static job priority schedulability Deadline model Task model ImplicitConstrainedArbitrary SporadicUnknown complexity Unknown complexity; Synchronous periodic not a critical instant Synchronous Periodic Unknown complexity; Simulation until hyperperiod Unknown complexity Asynchronous Periodic Strong NP-hard

63 19/05/2008Marko Bertogna - PhD dissertation63 Multiprocessor static priority run- time scheduling Deadline model Task model ImplicitConstrainedArbitrary Periodic (synchronous or asynchronous) Unknown complexity; Cucu ’ s optimal priority assignment SporadicUnknown complexity;

64 19/05/2008Marko Bertogna - PhD dissertation64 Multiprocessor static priority feasibility Deadline model Task model ImplicitConstrainedArbitrary Sporadic Unknown complexity; Synchronous periodic not a critical instant Synchronous Periodic Strong NP-hard; Simulation until hyperperiod for all n! priority assignments Asynchronous Periodic Strong NP-hard; Simulation on exponential feasibility interval for all n! priority assignments

65 19/05/2008Marko Bertogna - PhD dissertation65 Multiprocessor static priority schedulability Deadline model Task model ImplicitConstrainedArbitrary Sporadic Unknown complexity; Synchronous periodic not a critical instant Synchronous Periodic Unknown complexity; Simulation until hyperperiod Asynchronous Periodic Strong NP-hard; Simulation on exponential feasibility interval

66 19/05/2008Marko Bertogna - PhD dissertation66 RTA for Uniprocessors For FP, the worst-case response time of a task is given by the first instance released at a critical instant For EDF, it is given by an instance in a busy interval starting with a critical instant With these observations it is possible to compute the WCRT of all tasks. Example: for FP, the WCRT of a task k is given by the fixed point of:

67 19/05/2008Marko Bertogna - PhD dissertation67 RTA refinement for EDF Still valid the bound: A different bound can be derived analyzing the worst-case workload in a situation in which: The interfering and interfered tasks have a common deadline All jobs execute as late as possible CiCi DkDk DiDi CiCi CiCi TiTi SiSi with: and :

68 19/05/2008Marko Bertogna - PhD dissertation68 RTA refinement for EDF A different bound can be derived analyzing the worst-case workload in a situation in which: The interfering and interfered tasks have a common deadline All jobs execute as late as possible CiCi DkDk DiDi CiCi CiCi TiTi SiSi with: and :

69 19/05/2008Marko Bertogna - PhD dissertation69 Polynomial complexity test A lower bound on the slack of  k is given by: For EDF: For FP:

70 19/05/2008Marko Bertogna - PhD dissertation70 Limiting the number of iterations


Download ppt "Real-Time Scheduling Analysis for Multiprocessor Platforms Marko Bertogna PhD dissertation Scuola Superiore S.Anna, Pisa, Italy."

Similar presentations


Ads by Google