Linux Scheduling CS 4510
Scheduling Policy ► The scheduling algorithm of traditional Unix systems must fulfill several conflicting objectives Fast process response time Good throughput for background jobs Avoidance of process starvation Etc. ► The set of rules used to determine when and how to select a new process to run is called the scheduling policy
Scheduling Policy ► Linux scheduling is based on time sharing ► CPU time is divided into slices, one for each runnable process ► If a currently running process is not terminated when its quantum expires, a process switch may take place
Scheduling Policy ► The scheduling policy ranks processes according to their priority ► In Linux, process priority is dynamic. Processes that have been denied the use of the CPU for a long time are boosted by dynamically increasing their priority Long running processes have their priority lowered.
Classes of Processes ► Interactive Processes Must respond quickly ► Typically between 50 and 150 ms ► Variance must also be bounded Includes: ► Command Shells ► Text editors ► Graphical applications
Classes of Processes ► Batch processes Do not need user interaction Often penalized by the Scheduler Includes: ► Compilers ► Database Search Engines ► Scientific Computations
Classes of Processes ► Real-time processes Have very stringent scheduling requirements Should never be blocked by lower-priority processes Needs ► Short guaranteed response time ► Minimum variance Includes ► Video and sound applications ► Robot controllers ► Data Collectors from physical sensors
Process Preemption ► If a process enters the TASK_RUNNING state, the kernel checks whether its dynamic priority is greater than the priority of the currently running process. ► If so, the current process is interrupted and the scheduler is invoked to select another process to run. ► A preempted process is not suspended it remains in the TASK_RUNNING state; it simply no longer uses the CPU
How Long Should a Quantum Last? ► If its too short, system overhead is high ► If too long, processes no longer appear to be responsive ► Long quantum durations do not usually degrade response time Higher priority process, such as interactive process, will quickly interrupt lower priority processes like batch processes ► The choice of quantum duration is always a compromise Choose a duration as long as possible while keeping good system response time.
The Scheduling Algorithm ► CPU time is divided into epochs. Every process has a time quantum whose duration is computed when the epoch begins The quantum is the maximum CPU time assigned to the process in that epoch A process can be selected several times from the scheduler in the same epoch, as long as its quantum is not exhausted
The Scheduling Algorithm ► Base Time Quantum Assigned by the scheduler if a process has exhausted its time quantum in the previous epoch Users can change the base time quantum by using the system calls nice( ) and setPriority( ) A new process inherits the base time quantum of its parent
The Scheduling Algorithm ► Process Priorities Static priority ► Assigned by the users to real-time process ► Ranges from 1 to 99 ► Never changed by the scheduler Dynamic priority ► Applies only to conventional processes ► Sum of Base time quantum Ticks of CPU time left to the process before its quantum expires
CPU’s Data Structures ► nice Determines the length of the process time quantum when a new epoch begins. Ranges between -20 and +19 ► Negative values correspond to high priority processes ► counter Number of ticks of CPU time left to the process before its quantum expires ► need_resched A flag checked by ret_from_sys_call( ) to decide whether to invoke the schedule( ) function ► counter Number of ticks of CPU time left to the process before its quantum expires ► cpus_allowed A bitmask specifying the CPU on which the process is allowed to run ► cpus_runnable A bit mask specifying the CPU that is executing the process, if any.
CPU’s Data Structures ► nice Determines the length of the process time quantum when a new epoch begins. Ranges between -20 and +19 ► Negative values correspond to high priority processes ► counter Number of ticks of CPU time left to the process before its quantum expires ► need_resched A flag checked by ret_from_sys_call( ) to decide whether to invoke the schedule( ) function
The schedule( ) function ► Direct Invocation The scheduler is invoked directly when the current process must be blocked right away ► Steps 1.Inserts current in the proper wait queue 2.Changes state to TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE 3.Invokes schedule( ) 4.Goto step 2 unless resource is available 5.Once resource is available, removes current from wait queue
The schedule( ) Function ► Lazy Invocation The scheduler can also be invoked in a lazy way by setting the “need_resched” field of current to 1 A check on this value is always made before resuming execution of a User Mode process Schedule will be invoked at some time in the near future ► Lazy invocation is performed in the following cases When current has used up its time quantum When a process is woken up and its priority is higher than the current process When a sched_yield( ) system call is issued
Actions Performed Before a Process Switch ► The key outcome of the function is to set a local variable ‘next’ so that it points to the PCB of the process selected to replace current
Actions Performed Before a Process Switch ► The schedule function starts with the following code prev = current; this_cpu = prev->processor; sched_data = &aligned_data[this_cpu]
Actions Performed Before a Process Switch ► Before starting to look at the runnable processes, schedule( ) must disable the local interrupts and acquire the spin lock that protects the run queue. ► spin_lock_irq(&runqueue_lock)
Actions Performed Before a Process Switch ► If prev is not in the TASK_RUNNING state, schedule( ) was directly invoked by the process itself because it had to wait on some external resource; therefore, prev must be removed from the runqueue ► if (prev->state != TASK_RUNNING) del_from_runqueue(prev); ► The function also resets the need_resched field of current, just in case the scheduler was activated in the lazy way: ► prev->need_resched = 0;
Actions Performed Before a Process Switch ► Now schedule scans the runqueue to find the process to be executed in the next quantum. repeat_schedule: next = init_tasks[this_cpu]; c = -1000; list_for_each(tmp, &runqueue_head) P = list_entry(tmp, struct task_struct, runlist); int weight = goodness(p, this_cpu, prev->active_mm); If (weight > c) c = weight, next = p; ► The goodness( ) function returns an integer that denotes the priority of the process passed as a parameter
Actions Performed Before a Process Switch ► While scanning processes in the runqueue, schedule( ) considers only those that are Runnable on the executing CPU ► (cpus_allowed & 1<<this_cpu) Not alreading on some other CPU ► (cpus_runnable & 1<<this_cpu)
Actions Performed Before a Process Switch ► If the run-queue is empty, next points to the swapper kernel thread associated with the executing CPU. ► It is also possible that the best candidate turns out to be the old current process -- prev
Actions Performed Before a Process Switch ► If C is set to 0, then all processes have exhausted their time quantum. When this happens a new epoch begins, and all processes are assigned a fresh quantum (if !c) for_each_task(p) p->counter = (p->counter >> 1) + (20 – p->nice) / (20 – p->nice) / goto repeat_schedule ► Suspended or stopped process have their dynamic priorities periodically increased. This gives a higher priority to interactive processes
Actions Performed Before a Process Switch ► The last thing that is done before the process switch is to be sure that the processes memory is set up properly
Actions Performed after a Context Switch ► Schedule( ) invokes _schedule_tail() ► This function checks whether some other process has set the need_resched field of prev while it was not running. In this case, the whole schedule function is reexecuted from the beginning.
Actions Performed After a Process Switch ► Most of the functions performed by schedule( ) after the context switch are primarily important for multi-processor systems.
Goodness Values ► weight == -1 p is the previous process, and its SCHED_YIELD flag is set. The process will be selected only if no other runnable processes are in the runqueue ► weight == 0 p is a conventional process that has exhausted its quantum ► 2 <= weight <= 77 p is a conventional process that has not exhausted its quantum ► weight >= 1000 p is a real-time process
Goodness Values ► The goodness value is computed as follows ► weight = p->counter p->nice ; ► A bonus is also given to process running on the same CPU and those which share the same memory space as the kernel