EMERALDS Landon Cox March 22, 2017
Real-time systems Typical hardware constraints Typical workload Slow, low-power processors Small memories Little to no persistent storage Typical workload Periodic sensing and actuation Task periodicity deadlines
EMERALDS OS for embedded systems Due to hardware constraints Want to minimize overhead everywhere Use resources on real work, not management Focus on three system services Task scheduling Synchronization Communication
Rate-monotonic scheduling Each task has A periodicity (e.g., must run every 10 ms) A worst-case execution time (e.g., 5 ms) A static priority (i.e., does not change over time) Basic idea Assign task priorities based on periodicity Tasks with smaller period get higher priority Can use pre-emption to interrupt a task
Rate-monotonic scheduling T1 (Period 100ms, execution time 50ms) T2 (Period 200ms, execution time 80ms) Pre-empt and schedule every 50ms 50 100 150 180 200 T1 T2 T1 T2 What to run next? What to run next? What to run next? Did we meet all of our deadlines? What was our utilization?
Rate-monotonic scheduling Rate-monotonic is optimal for fixed-priority Maximizes task-set “schedulability” Ensures that max number of tasks meet their deadlines Scheduability test for RM (Liu and Layland) Completion time m tasks Task period
Rate-monotonic scheduling Rate-monotonic is optimal for fixed-priority Maximizes task-set “schedulability” Ensures that max number of tasks meet their deadlines Scheduability test for RM (Liu and Layland) If utilization is below this number, a feasible schedule exists
Rate-monotonic scheduling Rate-monotonic is optimal for fixed-priority Maximizes task-set “schedulability” Ensures that max number of tasks meet their deadlines Scheduability test for RM (Liu and Layland) for m = 2, utilization < .83
Rate-monotonic scheduling Rate-monotonic is optimal for fixed-priority Maximizes task-set “schedulability” Ensures that max number of tasks meet their deadlines Scheduability test for RM (Liu and Layland) As long as utilization is below 0.69, RM will meet all deadlines for m ∞, utilization < ln(2) ≈ 0.69
Rate-monotonic scheduling Rate-monotonic is optimal for fixed-priority Maximizes task-set “schedulability” Ensures that max number of tasks meet their deadlines Scheduability test for RM (Liu and Layland) Leaves roughly 31% CPU for scheduler and other stalls for m ∞, utilization < ln(2) ≈ 0.69
Rate-monotonic scheduling T1 (Period 100ms, execution time 50ms) T2 (Period 200ms, execution time 80ms) Pre-empt and schedule every 50ms 50 100 150 180 200 T1 T2 T1 T2 How is RM different than earliest-deadline first (EDF)? RM priorities are static, EDF prioritizes task closest to deadline
Scheduling overheads Runtime overhead Schedulability overhead Time to decide what to run next Want fast access to TCB queues Schedulability overhead For a given a task set Can all tasks in set be processed in time?
Runtime overhead EDF (dynamic priorities) RM (fixed priorities) Tasks are kept in an unsorted list Walk the whole list to find one with earliest deadline O(1) to block/unblock a task (to update TCB) O(n) to schedule next task (to find earliest deadline) RM (fixed priorities) Tasks are kept in a sorted list Keep pointer to highest priority ready task O(n) to block (to update TCB, set highestP pointer) O(1) to unblock (update TCB) O(1) to schedule next task (use highestP pointer)
Scheduability overhead EDF Can schedule all workloads where Zero scheduability overhead RM Observed utilizations of 0.88 on average for RM 1
Rate-monotonic scheduling 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T1 1 T2 2 T3 3 T4 4 T1 5 T2 6 T3 7 T4 8 What’s wrong? What to run next? What to run next? What to run next? What to run next? What to run next? What to run next? What to run next? What to run next? When would T5 run under EDF?
Combined static/dynamic (CSD) Hybrid approach Very useful for solving many kinds of problems Use one approach when it does well Use another approach when it does well Main idea: two queues Dynamic queue (unsorted, used by EDF) Fixed queue (sorted, used by RM) Key is figuring out which queue to put tasks on
CSD scheduling DPQ FPQ i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 Need to identify longest-period task that fails under RM T1 DPQ T2 T3 T4 T5 Which tasks have higher priority? FPQ T6 T7 T8 T9 T10
Need to identify longest-period task that fails under RM CSD scheduling i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 Need to identify longest-period task that fails under RM T1 DPQ T2 T3 T4 T5 When do FPQ tasks run? FPQ T6 T7 T8 T9 T10
When would FPQ tasks run? CSD scheduling i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T1 1 T2 2 T3 3 T4 4 T5 4.5 T1 5.5 T2 6.5 T3 7.5 T1 DPQ T2 T3 T4 T5 T1 T2 T3 T4 T5 When would FPQ tasks run? FPQ T6 T7 T8 T9 T10 Need a DPQ task to block
CSD scheduling DPQ FPQ i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T1 1 T2 2 T3 3 T4 4 T5 4.5 T1 5.5 T2 6.5 T3 7.5 T1 DPQ T2 T3 T4 T5 T1 T2 T3 T5 What if we have a lot of tasks? FPQ T6 T7 T8 T9 T10 Time to walk DPQ could grow
CSD scheduling DPQ FPQ i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T1 1 T2 2 T3 3 T4 4 T5 4.5 T1 5.5 T2 6.5 T3 7.5 T1 DPQ T2 T3 T4 T5 T1 T2 T3 T5 Why might this be a problem? FPQ T6 T7 T8 T9 T10 May start missing deadlines
CSD scheduling DPQ FPQ i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T1 1 T2 2 T3 3 T4 4 T5 4.5 T1 5.5 T2 6.5 T3 7.5 T1 DPQ T2 T3 T4 T5 T1 T2 T3 T5 What’s the solution? FPQ T6 T7 T8 T9 T10 Split DPQ into two Qs
CSD3 scheduling DPQ1 DPQ2 FPQ i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 Which tasks go on which queue? T1 DPQ1 T2 T3 More frequent tasks in DPQ1 T4 DPQ2 T5 FPQ T6 T7 T8 T9 T10
CSD3 scheduling DPQ1 DPQ2 FPQ i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 Why is DPQ1 helpful? T1 DPQ1 T2 T3 Lower search time for frequent tasks. Longer searches occur less frequently. T4 DPQ2 T5 FPQ T6 T7 T8 T9 T10
CSD3 scheduling DPQ1 DPQ2 FPQ i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T1 DPQ1 T2 T3 When do DPQ2 tasks run? T4 DPQ2 T5 When DPQ1 is empty FPQ T6 T7 T8 T9 T10
CSD3 scheduling DPQ1 DPQ2 FPQ i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T1 DPQ1 T2 T3 When do FPQ tasks run? T4 DPQ2 T5 When DPQ1, DPQ2 are empty FPQ T6 T7 T8 T9 T10
CSD3 scheduling DPQ1 DPQ2 FPQ i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T1 DPQ1 T2 T3 What is the downside of DPQ2? T4 DPQ2 T5 Schedualibility suffers FPQ T6 T7 T8 T9 T10
We’re back to missing its deadline CSD3 scheduling i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T4 DPQ2 T5 T1 DPQ1 T2 T3 What if DPQ2 only has T5? We’re back to missing its deadline FPQ T6 T7 T8 T9 T10
Exhaustive offline search! CSD3 scheduling i 1 2 3 4 5 6 7 8 9 10 Pi 20 30 50 100 130 ci 1.0 0.5 T4 DPQ2 T5 T1 DPQ1 T2 T3 What’s the solution? Exhaustive offline search! FPQ T6 T7 T8 T9 T10
Synchronization General approach Main primitive: semaphores Integrate synchronization with scheduling Same approach as most thread libraries Main primitive: semaphores Initialize w/ value one to use as a lock Initizlize w/ value zero to use for ordering (e.g., CV)
Synchronization Semaphores w/ priority inheritance for locking if (sem locked) { do priority inheritance; add calling thread to wait queue; block; } lock sem;
How do we get rid of this context switch? Synchronization Tx How do we get rid of this context switch? Schedule T1 before T2 unlock T1 (lock holder) lock T2
Synchronization Tx What extra info must be passed to blocking call? ID of semaphore to be acquired unlock T1 (lock holder) lock T2
What does the scheduler do before it tries to run T2? Synchronization Tx What does the scheduler do before it tries to run T2? Checks if semaphore is held. If so, transfers T2 priority to T1. Blocks T2 on semaphore wait list. unlock T1 (lock holder) lock T2
When T1 releases the lock. This transfers T2 priority back to T2. Synchronization Tx When is T2 unblocked? When T1 releases the lock. This transfers T2 priority back to T2. unlock T1 (lock holder) lock T2
Synchronization Tx unlock T1 (lock holder) lock T2
Communication Typical access pattern Could use producer-consumer queue One task reading a sensor Many tasks processing sensor values Single writer, multiple readers Could use producer-consumer queue Writer acquires lock on shared queue Readers block acquiring lock to read queue Writer adds new value to queue Readers take turns acquiring lock, reading value Too slow for an embedded system
Communication Index = 0 State Message Note this is for a uni-processor. System is concurrent, but not parallel. R0 R1 Reader R2 R3 R4 Writer R5 Index = 0 State Message Reader
What is the difference between concurrency and parallelism? Communication What is the difference between concurrency and parallelism? R0 R1 Reader R2 Parallelism is when threads run at the same time. Concurrency is when threads’ execution overlaps in time. R3 R4 Writer R5 Index = 0 State Message Reader
Communication Index = 0 State Message What operations are atomic? R0 Reader R2 Loads and stores to access B bytes. R3 R4 Writer R5 Index = 0 State Message Reader
Do I need this circular buffer if my data is ≤ B bytes? Communication Do I need this circular buffer if my data is ≤ B bytes? R0 R1 Reader R2 No, can just update with load or store. Value won’t be partially written. Need buffers when data is > B bytes. R3 R4 Writer R5 Index = 0 State Message Reader
What happens if reader runs between 1 and 2? Communication R0 To update SM: Read new sensor value Read index into i Write value to (i+1)%6 Update index to (i+1)%6 R1 Reader R2 To read SM: Read index into i Read value at buffer[i] What happens if reader runs between 1 and 2? R3 R4 Writer R5 Index = 0 Sees old value State Message Reader
What happens if reader runs between 3 and 4? Communication R0 To update SM: Read new sensor value Read index into i Write value to (i+1)%6 Update index to (i+1)%6 R1 Reader R2 To read SM: Read index into i Read value at buffer[i] What happens if reader runs between 3 and 4? R3 R4 Writer R5 Index = 0 Sees old value State Message Reader
What happens if writer runs between 1 and 2? Communication R0 To update SM: Read new sensor value Read index into i Write value to (i+1)%6 Update index to (i+1)%6 R1 Reader R2 To read SM: Read index into i Read value at buffer[i] R3 What happens if writer runs between 1 and 2? R4 Writer R5 Index = 0 Sees old value State Message Reader
Communication Index = 0 State Message R0 R1 Reader R2 R3 To update SM: Read new sensor value Read index into i Write value to (i+1)%6 Update index to (i+1)%6 R1 Reader R2 To read SM: Read index into i Read value at buffer[i] R3 What happens if writer runs 6 times between start of 2 and end of 2? R4 Writer R5 Index = 0 Reader will see garbled value State Message Reader
Communication Index = 0 State Message R0 R1 Reader R2 R3 To update SM: Read new sensor value Read index into i Write value to (i+1)%6 Update index to (i+1)%6 R1 Reader R2 To read SM: Read index into i Read value at buffer[i] R3 What happens if writer runs 6 times between start of 2 and end of 2? R4 Writer R5 Index = 0 Reader will see garbled value State Message Reader
Communication Could make the buffer longer Idea Making the buffer too long will waste memory Idea We know how long read and write tasks take Set buffer length based on longest a read could take d is reader’s period (deadline) c is reader’s compute time cr is time to read value maxReadTime = d – (c – cr)
Communication Could make the buffer longer Idea Making the buffer too long will waste memory Idea We know how long read and write tasks take Set buffer length based on longest a read could take xmax is max writes that could garble values Pw is writer’s period dw is writer’s deadline maxReadTime = d – (c – cr) xmax > FLOOR((maxReadTime – (Pw – dw))/Pw)
Communication Could make the buffer longer Idea Making the buffer too long will waste memory Idea We know how long read and write tasks take Set buffer length based on longest a read could take maxReadTime W W W W W dw pw maxReadTime = d – (c – cr) xmax > FLOOR((maxReadTime – (Pw – dw))/Pw)
Course administration Research projects Study use of native code in Android apps Build a FUSE file system for Linux