Operating Systems CS 550 Spring 2017 Kenneth Chiu

Operating Systems CS 550 Spring 2017 Kenneth Chiu
Scheduling Operating Systems CS 550 Spring 2017 Kenneth Chiu

Scheduling Low-level mechanisms of running processes
Limited direct execution Context switching High-level policies – scheduling policies Scheduling appears in many disciplines (e.g., assembly lines for efficiency) How to develop a framework for studying scheduling policy? Key assumptions Key metrics Basic approaches

Workload assumptions Workload – processes/jobs running in the system
Critical to building policies the more you know, the more fine-tuned your policy can be Unrealistic, initially, but will be relaxed later Each job runs for the same amount of time All jobs arrive at the same time All jobs only use CPU (i.e., they perform no I/O) The run-time of each job is known – scheduler knows everything!

Scheduling metrics To compare different scheduling policies
Metrics – used to measure something Turnaround time: the time at which the job completes minus the time at which the job arrived in the system (a performance metric) Fairness A tradeoff between fairness and performance Anyone can give a simple example of tradeoff between fairness and performance? (To boost the performance of a process  assign more CPU time  not fair to other processes)

First In First Out (FIFO)
A.k.a. First come, first served (FCFS) Each job runs 10 sec. Average turnaround time = ? = ( )/3 = 20 Imagine three jobs arrive in the system, A, B, and C, at roughly the same time (Tarrival = 0). Because FIFO has to put some job first, let’s assume that while they all arrived simultaneously, A arrived just a hair before B which arrived just a hair before C.

First In First Out (FIFO)
What kind of workload could you construct to make FIFO perform poorly? Average turnaround time = ? = ( )/3 = 110 Convoy effect (supermarket checkout) a number of relatively-short potential consumers of a resource get queued behind a heavyweight resource consumer Relax the assumption 1, and now assume different job can different amount of time to complete. How can you solve this problem?

Shortest Job First (SJF)
Average turnaround time = ? = ( )/3 = 50 Assuming all jobs arrive at the same time, SJF is optimal

Shortest Job First (SJF)
SJF with A arrives at 0, and B, C at 10 How would you do better? Preemption Let’s relax the assumption 2, and now assume that jobs can arrive at any time instead of all at once. What problems does this lead to? To solve the problem, we need to relax the assumption 3 (that jobs must run to completion)

Shortest Time-to-Completion First (STCF)
Shortest Time-to-Completion First (STCF) or Preemptive Shortest Job First (PSJF): any time a new job enters the system, it determines of the remaining jobs and new job, which has the least time left, and then schedules that one. Average turnaround time = ? = ((120-0)+(20-10)+(30-10))/3 = 50 Average turnaround time is the same as SJF

A new metric: response time
Interactive applications Response time = time from when the job arrives in a system to the first time it is scheduled Example: If we knew that job lengths, and jobs only used the CPU, and our only metric was turnaround time, STCF would be a great policy. In fact, for a number of early batch computing systems, these types of scheduling algorithms made some sense. However, the introduction of time-shared machines changed all that. Average response time = ? (0+0+10)/3 = 3.3 sec.

A new metric: response time
STCF and related disciplines are not particularly good for response time If three jobs arrive at the same time, for example, the third job has to wait for the previous two jobs to run in their entirety before being scheduled just once. While great for turnaround time, STCF is quite bad for response time and interactivity. How can we build a scheduler that is sensitive to response time?

Round Robin (RR) Instead of running jobs to completion, RR runs a job for a time slice (scheduling quantum) and switches to the next job in the ready queue; repeatedly does so until jobs are finished Time slicing – length of time slice = multiple of timer-interrupt period SJF response time = 5 RR response time = 1

Round Robin (RR) Length of time slice on response time ? Amortization
The shorter the time slice is, the better the performance of RR under the response-time metric. Can we make the time slice as small as possible? Where do the overheads come from? Context switch (saving/restoring registers), cache/TLB flushed … Design trade-off: making the time slice long enough to amortize the cost of switching, but not to long that the system is no longer responsive. Amortization Used in systems when there is a fixed cost to some operation By incurring that cost less often, the total cost to the system is reduced.

Round Robin (RR) RR is one of the worst policies if turnaround time is the metric. More generally, any policy (such as RR) that is fair, (i.e., that evenly divides the CPU among active processes on a small time scale) will perform poorly on metrics such as turnaround time. Common tradeoff: performance (turnaround time) vs. fairness (response time)

Incorporating I/O Process is blocked waiting for I/O completion
When I/O completes, an interrupt is raised, and OS runs and moves the process that issued the I/O from blocked state back to the ready state Now we relax the assumption 4 — of course all programs perform I/O. To understand this issue better, let us assume we have two jobs, A and B, which each need 50 ms of CPU time. Treat each 10 ms sub-job as an independent job. Then for using STCF policy, it is natural … Keyword: if the scheduler can *predict* …

What we discussed so far
The basic ideas behind scheduling and developed two families of approaches. The first runs the shortest job remaining and thus optimizes turnaround time. The second alternates between all jobs and thus optimizes response time. Incorporating I/O. One fundamental problem remains: the inability of the OS to see into the future.

Multi-level feedback queue (MLFQ)

Problems to be addressed:
Optimize turnaround time (by running shorter jobs first). Minimize response time for interactive users. Do we know anything (running time) about the jobs? How to schedule without perfect knowledge? Can the scheduler learn? How? Learn from the past to predict the future.

Basic Setup List of queues, each assigned a different priority level.
Jobs that are ready are on a queue. First two basic rules for MLFQ: Rule 1: Higher priority queues have strict priority. If Priority(A) > Priority(B), A runs (B doesn’t). Rule 2: For jobs in the same queue , RR is used. If Priority(A) = Priority(B), A & B run in RR

Feedback MLFQ varies the priority of a job based on its observed behavior. Job relinquishes CPU while waiting for input from keyboard, keep priority high. Job uses the CPU intensively for long time, reduce its priority. Use history of job to predict its future behavior. Workload: a mix of Interactive jobs that are short-running (and may frequently relinquish CPU), and Some longer-running “CPU-bound” jobs that need a lot of CPU time but where response time isn’t important - The key to MLFQ scheduling lies in how the scheduler sets priorities.

Changing Priority Rules on changing job priority
Rule 3: When a job enters the system, it is placed at the highest priority (the topmost queue). Rule 4a: If a job uses up an entire time slice while running, its priority is reduced (i.e., it moves down one queue). Rule 4b: If a job gives up CPU before the time slice is up, it stays at the same priority level.

One long-running job

One long-running job and one short job:
Scheduler first assumes it is a short job Approximate SJF? If it actually is a short job, it will run quickly and complete; if it is not a short job, it will slowly move down the queues, and thus soon prove itself to be a long-running In this manner, MLFQ approximates SJF

How about I/O? Rule 4b: If a job gives up CPU before the time slice is up, it stays at the same priority level.

So Far… MLFQ rules thus far Any potential problems?
Rule 1: If Priority(A) > Priority(B), A runs (B doesn’t). Rule 2: If Priority(A) = Priority(B), A & B run in RR. Rule 3: When a job enters the system, it is placed at the highest priority (the topmost queue). Rule 4: If a job uses up an entire time slice while running, its priority is reduced (i.e., it moves down one queue); If a job gives up CPU before the time slice is up, it stays at the same priority level. Any potential problems?

Problems Problems with the current version of MLFQ Starvation
Too many interactive jobs, and thus long-running jobs will never receive any CPU time. Gaming the scheduler Doing something sneaky to trick the scheduler into giving you more than your fair share of the resource. What if a CPU-bound job turns to I/O-bound There is no mechanism to move the job up to queues with higher priorities!

Boosting priority Rule 5: After some time period S, move all the jobs in the system to the topmost queue. What problems does this rule solve? Starvation When a CPU-bound job turns to I/O bound.

CPU time accounting The scheduler keeps track of how much of a time slice a job used at a given level; once a job has used its allotment, it is demoted to the next priority queue. Rule 4: If a job uses up an entire time slice while running, its priority is reduced (i.e., it moves down one queue); If a job gives up CPU before the time slice is up, it stays at the same priority level. Rule 4: Once a job uses up its time allotment at a given level (regardless of how many times it has given up the CPU), its priority is reduced (i.e., it moves down one queue).

CPU time accounting

Tuning MLFQ How to parameterize MLFQ ?
how many queues how big should time slice be per queue how often should priority be boosted No easy answers and need experience with workloads varying time-slice length across different queues (e.g., high-priority queues are usually given short time slices, and low-priority queues are with long time slices) Some schedulers reserve the highest priority levels for operating system work. Some systems also allow some user advice to help set priorities (e.g., for example, by using the command-line utility nice you can increase or decrease the priority of a job (somewhat) and thus increase or decrease its chances of running at any given time).

Summary Rule 1: If Priority(A) > Priority(B), A runs (B doesn’t)
Rule 2: If Priority(A) = Priority(B), A & B run in RR Rule 3: When a job enters the system, it is placed at the highest priority (the topmost queue) Rule 4: Once a job uses up its time allotment at a given level (regardless of how many times it has given up the CPU), its priority is reduced (i.e., it moves down one queue) Rule 5: After some time period S, move all the jobs in the system to the topmost queue Instead of demanding a priori knowledge of a job, it instead observes the execution of a job and prioritizes it accordingly It manages to achieve the best of both worlds: it can deliver excellent overall performance (similar to SJF/STCF) for interactive jobs, and is fair and makes progress for long-running CPU-intensive workloads

Proportional share

Proportional-share scheduler
MLFQ – two goals: optimizing turnaround time & minimizing response time A different type of scheduler -- proportional-share scheduler (a.k.a. fair-share scheduler). Instead of optimizing for turnaround or response time, a scheduler might instead try to guarantee that each job obtain a certain percentage of CPU time. Example: lottery scheduling Basic idea: the scheduler hold a lottery to determine which process should get to run next; processes that should run more often should be given more chances to win the lottery.

Lottery scheduling: tickets
Fundamental concept – tickets Used to represent the share of a resource that a process should receive. The percent of tickets that a process has represents its share of the system resource in question. Example: 2 processes 100 tickets in the system, process A has 75 tickets, process B has 25 tickets  process A should receive 75% of the CPU, and process B should receive 25% Assuming A hold tickets 0~74, B holds 75~99

Principle behind lottery scheduling: randomness
What are the advantage of using randomness? Avoids strange corner-case behaviors that a more traditional algorithm may have LRU algorithm: performs poorly for some cyclic-sequential workloads Lightweight Traditional fair-share algorithm needs to perform per-process accounting to track how much CPU a process has received Fast: depends on how fast the random number generation algorithm is (but the faster, the more towards pseudo-random)

Ticket currency Lottery scheduling provides a number of mechanisms to manipulate tickets. Ticket currency: allows a user with a set of tickets to allocate tickets among their own jobs, using own "currency".

Ticket currency Why the ticket currency mechanism is desirable?
Consider the case where in a multi-user system, a user manages multiple processes Want to let her favor some threads over others without impacting the threads of other users Will let her create new tickets but will debase the individual values of all the tickets she owns Her tickets will be expressed in a new currency that will have a variable exchange rate with the base (or global) currency

Ticket currency Example 1:
User A currently manages three processes, with 10 tickets. A has 5 tickets B has 3 tickets C has 2 tickets User A creates 5 extra tickets and assigns them to a new process D User A now has 15 tickets These 15 tickets represent 15 units of a new currency whose exchange rate with the base currency is 10/15 The total value of A tickets expressed in the base currency is still equal to 10

Ticket currency Example 2:
Users A and B have each been given 100 tickets. User A is running two jobs, A1 and A2, and he gives them each 500 tickets (out of 1000 total) in User A’s own currency. User B is running only 1 job and gives it 10 tickets (out of 10 total).

Ticket transfer With transfers, a process can temporarily hand off its tickets to another process. What is the scenario this mechanism is useful? In a client/server setting, a client process sends a message to a server asking it to do some work on the client’s behalf. To speed up the work, the client can pass the tickets to the server and thus try to maximize the performance of the server while the server is handling the client’s request. When finished, the server then transfers the tickets back to the client.

Ticket inflation Lets processes create new tickets
Like printing their own money Counterpart is ticket deflation Normally disallowed except among mutually trusting clients Lets them to adjust their priorities dynamically without explicit communication

Implementation The most significant advantage of the lottery scheduling is the simplicity of the implementation. A good random number generator to pick the winning ticket. A list to track the processes of the system. Example: three processes A, B, and C Suppose the lottery number is 300

Lottery scheduling dynamics
A brief study of the completion time of two jobs, A and B, competing against one another, each with the same number of tickets (100) and same run time (R). Unfairness metric U U = A's completion time / B's completion time If R=10 ms, A completes at time 10 ms, and B at 20 ms, then U = 10 ms/20 ms = 0.5 A perfectly fair scheduler would achieve U = 1.

Lottery scheduling dynamics
Varying R from 1 to 1000 over thirty trials , we have The longer the job run time is, the more fair the outcome is. Why?

Stride scheduling The lottery scheduling relies on randomness to achieve fairness  not deterministic! it occasionally will not deliver the exact right proportions, especially over short time scales. Stride scheduling – a deterministic fair-share scheduler.

Stride scheduling Each job in the system has a stride, which is inverse in proportion to the number of tickets it has. Three jobs: A, B, and C Tickets: 100, 50, 250 Strides: 100, 200, 40 (10,000/tickets) Every time a process runs, the scheduler will increment a counter for it (called its pass value) by its stride to track its global progress. The pass value are used to determine which process to schedule.

Stride scheduling Three jobs: A, B, and C Tickets: 100, 50, 250
Within a fixed period, A runs twice, B runs once, and C runs 5 times.

Stride scheduling Given the precision of stride scheduling, why use lottery scheduling at all? The lottery scheduling does not need to maintain a global state, which make it easy for the scheduler to cope with the dynamics of processes. Think about what you would do when processes come and leave for the stride scheduling.

multiprocessor scheduling

Multiprocessor scheduling
So far what we discussed focused on single-processor scheduling. How can we extend those ideas to work on multiple CPUs? What new problems must we overcome?

Background: multiprocessor architecture
The fundamental difference between single-CPU hardware and multi-CPU hardware is … the use of hardware caches

An example A program running on CPU 1 reads a data item (with value D) at address A  not in CPU 1 cache read from memory. The program then modifies the value at address A, just updating its cache with the new value D' (depending on the cache write-back policy, the new value may not be written back to memory immediately). OS decides to stop running the program and move it to CPU 2. The program then re-reads the value at address A  there is no such data CPU 2’s cache, read from memory. Thus the system fetches the value from main memory, and gets the old value D instead of the correct value D'.

Cache coherence The previous problem is known as “cache coherence” problem. A vast research literature on this regard. The basic solution is provided by hardware. By monitoring memory accesses, hardware can ensure that the “right thing” happens and that the view of a single shared memory is preserved.

Cache affinity It is often advantageous to run a process on the same CPU. Why? A multiprocessor scheduler should consider cache affinity when making its scheduling decisions, preferring to keep a process on the same CPU if at all possible.

Single-Queue Multiprocessor Scheduling (SQMS)
The most basic approach is to simply reuse the basic framework for single processor scheduling, by putting all jobs that need to be scheduled into a single queue. Advantage: simplicity easy to adapt the existing single-processor policies to work on more than one CPU. Disadvantages: locking mechanisms need to be applied to the scheduler code causes performance degradation.

Single-Queue Multiprocessor Scheduling (SQMS)

Multi-Queue Multiprocessor Scheduling (MQMS)
To overcome the problem of SQMS, we can opt for multiple queues, one per CPU. In MQMS, each queue will likely follow a particular scheduling policy, such as round robin. When a job enters the system, it is placed on exactly one scheduling queue, according to some heuristic (e.g., random, or picking one with fewer jobs than others). Then it is scheduled essentially independently, thus avoiding the problems of information sharing and synchronization found in the single-queue approach.

Each queue  RR More scalable than SQMS, but …

A fundamental problem in MQMS  load imbalance.

How to solve the load imbalance problem? Process migration Basic approach: work stealing

xv6 scheduling One global queue across all CPUs
Local scheduling algorithm: RR scheduler() in proc.c

Linux scheduling overview
O(1) scheduler Multiple queues priority-based (similar to MLFQ) Completely Fair Scheduler (CFS) deterministic proportional-share approach (like the Stride scheduling) BF Scheduler (BFS) Single queue proportional-share, based on a more complicated scheme known as Earliest Eligible Virtual Deadline First (EEVDF)

Linux scheduler implementations
Linux 2.4: global queue, O(N) Simple Poor performance on multiprocessor/core Poor performance when n is large Linux 2.5 and early versions of Linux 2.6: O(1) scheduler, per-CPU run queue Solves performance problems in the old scheduler Complex, error prone logic to boost interactivity No guarantee of fairness Linux 2.6: completely fair scheduler (CFS) Fair Naturally boosts interactivity

Operating Systems CS 550 Spring 2017 Kenneth Chiu

Similar presentations

Presentation on theme: "Operating Systems CS 550 Spring 2017 Kenneth Chiu"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Operating Systems CS 550 Spring 2017 Kenneth Chiu

Similar presentations

Presentation on theme: "Operating Systems CS 550 Spring 2017 Kenneth Chiu"— Presentation transcript:

Similar presentations

About project

Feedback