The Linux “Completely Fair Scheduler”

Slides:



Advertisements
Similar presentations
Priority INHERITANCE PROTOCOLS
Advertisements

1 Always want to have CPU (or CPU’s) working Usually many processes in ready queue –Ready to run on CPU –Focus on a single CPU here Need strategies for.
Abdulrahman Idlbi COE, KFUPM Jan. 17, Past Schedulers: 1.2 & : circular queue with round-robin policy. Simple and minimal. Not focused on.
Linux Scheduler. Linux is a multitasking OS Deciding what process runs next, given a set of runnable processes, is a fundamental decision a scheduler.
Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
Completely Fair Scheduler Alireza Heidari. Introduction The Completely Fair Scheduler (CFS) is a process scheduler. Merged into the release of.
Outline Scapegoat Trees ( O(log n) amortized time)
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 24 Sorting.
Balanced Binary Search Trees
CS 206 Introduction to Computer Science II 10 / 14 / 2009 Instructor: Michael Eckmann.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
General Trees and Variants CPSC 335. General Trees and transformation to binary trees B-tree variants: B*, B+, prefix B+ 2-4, Horizontal-vertical, Red-black.
The Linux Scheduler 2.4 vs 2.6 Michael McCabe Michael McCabe
1 Inside the Windows NT Scheduler, Part 1 Assigning CPU time in a uniprocessor environment by Mark Russinovich Windows NT Magazine - July 1997 Inside the.
 Scheduling  Linux Scheduling  Linux Scheduling Policy  Classification Of Processes In Linux  Linux Scheduling Classes  Process States In Linux.
Ch 4. Process Scheduling. Overview (1) The process scheduler is the component of the kernel that selects which process to run next  Can be viewed as.
Review C++ exception handling mechanism Try-throw-catch block How does it work What is exception specification? What if a exception is not caught?
Linux Scheduling CS Scheduling Policy ► The scheduling algorithm of traditional Unix systems must fulfill several conflicting objectives  Fast.
Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel.
Nachos Phase 1 Code -Hints and Comments
Operating System Examples - Scheduling
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
Ch 4. Process Scheduling. Overview (1) The process scheduler is the component of the kernel that selects which process to run next  Can be viewed as.
IBM OS/2 Warp Mike Storck Matt Kerster Mike Roe Patrick Caldwell.
Scheduling policies for real- time embedded systems.
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
Operating Systems Process Management.
Java Threads. What is a Thread? A thread can be loosely defined as a separate stream of execution that takes place simultaneously with and independently.
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
CPU Scheduling Presentation by Colin McCarthy. Runqueues Foundation of Linux scheduler algorithm Keeps track of all runnable tasks assigned to CPU One.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
What Every Developer Should Know about the Kernel Dr. Michael L. Collard 1.
1 CMSC421: Principles of Operating Systems Nilanjan Banerjee Principles of Operating Systems Acknowledgments: Some of the slides are adapted from Prof.
BFSBFS by Con KolivasCon Kolivas Guruprasad Aphale. Real Time Lunch, 10/21/ Guruprasad Aphale.
File Organization and Processing Week Tree Tree.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 9.
Kernel Locking Techniques by Robert Love presented by Scott Price.
LINUX SCHEDULING Evolution in the 2.6 Kernel Kevin Lambert Maulik Mistry Cesar Davila Jeremy Taylor.
Cpr E 308 Spring 2005 Process Scheduling Basic Question: Which process goes next? Personal Computers –Few processes, interactive, low response time Batch.
1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
ITFN 2601 Introduction to Operating Systems Lecture 4 Scheduling.
Periodic scheduler for Linux OS
Real-Time systems By Dr. Amin Danial Asham.
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
Operating System Examples - Scheduling. References r Silberschatz et al, Chapter 5.6, Chapter
CS 367 Introduction to Data Structures Lecture 8.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 3.
Linux Process Management. Linux Implementation of Threads Threads enable concurrent programming / true parallelism Linux implementation of threads.
Operating System Examples - Scheduling. References r er/ch10.html r bangalore.org/blug/meetings/200401/scheduler-
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
Process Scheduling 國立中正大學 資訊工程研究所 羅習五 老師 1. Outline OS schedulers Unix scheduling Linux scheduling Linux 2.4 scheduler Linux 2.6 scheduler – O(1) scheduler.
CPU Scheduling Scheduling processes (or kernel-level threads) onto the cpu is one of the most important OS functions. The cpu is an expensive resource.
CSE373: Data Structures & Algorithms Priority Queues
Scheduling of Non-Real-Time Tasks in Linux (SCHED_NORMAL/SCHED_OTHER)
Linux Scheduling.
The Linux “Completely Fair Scheduler”
Main Memory Management
Chapter 2: The Linux System Part 3
CS Data Structure: Heaps.
Virtual-Time Round-Robin: An O(1) Proportional Share Scheduler
Midterm Review Brian Kocoloski
Scheduling of Regular Tasks in Linux
Scheduling Classes and Real-Time Scheduling in Linux
Linux Scheduling CSE 2431: Introduction to Operating Systems
Scheduling of Regular Tasks in Linux
CPU Scheduling David Ferry CSCI 3500 – Operating Systems
Presentation transcript:

The Linux “Completely Fair Scheduler” Ben Nayer – Kalamazoo College CS430 Operating Systems

Introduction Starting with release 2.6.23, the Linux kernel has included a new scheduler, replacing the “O(1) Scheduler” which was previously used. The new, so-called “Completely Fair Scheduler” was a major departure from the previous model, and is much simpler. Both the O(1) Scheduler and CFS were developed by Ingo Molnar.

O(1) Background Briefly – the scheduler maintained two runqueues for each CPU, with a priority linked list for each priority level (140 total). Tasks are enqueued into the corresponding priority list. The scheduler only needs to look at the highest priority list to schedule the next task. Assigns timeslices for each task. Had to track sleep times, process interactivity, etc.

Okay, maybe not briefly... Two runqueues per CPU, I said...one active, one expired. If a process hasn't used its entire timeslice, it's on the active queue; if it has, it's expired. Tasks are swapped between the two as needed. Timeslice and priority are recalculated when a task is swapped. If the active queue is empty, they swap pointers, so the empty one is now the expired queue.

Last one, I promise! The first 100 priority lists are for real-time tasks, the last 40 are for user tasks. User tasks can have their priorities dynamically adjusted, based on their dependency. (I/O or CPU) Better for SMP than previous schedulers; each CPU has its own queue, and its own lock. Previously, if one CPU was picking a task, it locked the queue, and made other CPUs wait.

The Completely Fair Scheduler CFS cuts out a lot of the things previous versions tracked – no timeslices, no sleep time tracking, no process type identification... Instead, CFS tries to model an “ideal, precise multitasking CPU” – one that could run multiple processes simultaneously, giving each equal processing power. Obviously, this is purely theoretical, so how can we model it?

CFS, continued We may not be able to have one CPU run things simultaneously, but we can measure how much runtime each task has had and try and ensure that everyone gets their fair share of time. This is held in the vruntime variable for each task, and is recorded at the nanosecond level. A lower vruntime indicates that the task has had less time to compute, and therefore has more need of the processor. Furthermore, instead of a queue, CFS uses a Red- Black tree to store, sort, and schedule tasks.

RB Trees A red-black tree is a binary search tree, which means that for each node, the left subtree only contains keys less than the node's key, and the right subtree contains keys greater than or equal to it. A red-black tree has further restrictions which guarantee that the longest root-leaf path is at most twice as long as the shortest root-leaf path. This bound on the height makes RB Trees more efficient than normal BSTs. Operations are in O(log n) time.

The CFS Tree The key for each node is the vruntime of the corresponding task. To pick the next task to run, simply take the leftmost node. http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/

VRuntime tracking The primary code modification I made was to have the scheduler printk the vruntime of the next task when it picks a new one. What do you think we should see? An aside: Originally, instead of tracking the vruntime, CFS tracked the wait time of a task, when it was not running; it decreased when a task was running. The goal was to keep it as close to 0 for all tasks as possible.

Digging in – CFS Data Structures CFS has three primary structures – task_struct, sched_entity, and cfs_rq. task_struct is the top-level entity, containing things such as task priorities, scheduling class, and the sched_entity struct. (sched.h, L1117) sched_entity includes a node for the RB tree and the vruntime statistic, among others. (sched.h, L1041) cfs_rq contains the root node, task group (more on this later), etc. (sched.c, L424) Let's take a look...

Priorities and more While CFS does not directly use priorities or priority queues, it does use them to modulate vruntime buildup. In this version, priority is inverse to its effect – a higher priority task will accumulate vruntime more slowly, since it needs more CPU time. Likewise, a low-priority task will have its vruntime increase more quickly, causing it to be preempted earlier. “Nice” value – lower value means higher priority. Relative priority, not absolute...

...that's it? The CFS algorithm is, as stated, a lot simpler than the previous one, and does not require many of the old variables. Preemption time is variable, depending on priorities and actual running time. So we don't need assign tasks a given timeslice.

Other additions CFS introduced group scheduling in release 2.6.24, adding another level of fairness. Tasks can be grouped together, such as by the user which owns them. CFS can then be applied to the group level as well as the individual task level. So, for three groups, it would give each about a third of the CPU time, and then divide that time up among the tasks in each group.

Modular scheduling Alongside the initial CFS release came the notion of “modular scheduling”, and scheduling classes. This allows various scheduling policies to be implemented, independent of the generic scheduler. sched.c, which we have seen, contains that generic code. When schedule() is called, it will call pick_next_task(), which will look at the task's class and call the class-appropriate method. Let's look at the sched_class struct...(sched.h L976)

Scheduling classes! Two scheduling classes are currently implemented: sched_fair, and sched_rt. sched_fair is CFS, which I've been talking about this whole time. sched_rt handles real-time processes, and does not use CFS – it's basically the same as the previous scheduler. CFS is mainly used for non-real-time tasks.

A visual aid is in order... Classes are connected via linked-list, making it easy to iterate among them. Each has its own functions corresponding to the core sched_class. http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/

Kernel Modification Or: How I spent a weekend trying to cripple the scheduler... Two main modifications – you've seen the effect of the first already. I inserted a pair of printk statements into sched_fair.c, and specifically in the pick_next_task_fair method. Every time a new process is selected, it will print out its name, PID, and its current vruntime value.

Modifications – what I DID do My goal was to mess around with the priorities and how they were utilized in scheduling. Of course, since they're not used as directly in the CFS, even finding them became a bit problematic. Priorities/weights seem to be used to modify vruntime in the calc_delta_mine function of sched.c (L1305), which is called by a series of functions, leading up to update_curr, which is called by entity_tick, in sched_fair.c. Modification was a simple as changing a division to a multiplication.

Modifications – what NOT to do This was not the first thing I attempted to do, however. Since I hadn't yet found the previous code, I first tried changing update_curr. (sched_fair.c, L463) I set it to subtract the result of (originally) calc_delta_mine from vruntime, instead of adding it. What do you think happened? Hint: it wasn't pretty.

Modifications – what I learned The most important lesson? VirtualBox snapshots are there for a reason. Don't forget to use them... Aside from that, I found that it may be relatively hard to trigger an obvious slowdown or side effect from fiddling with the priorities as I did, or at least doing so without crippling the virtual machine. While the commenting was decent in this part of the kernel, tracing specific operations was still an involved and convoluted task. Don't overdo it.

Interesting tidbits & questions One major advantage CFS has is attack resistance. There are methods of attacking the Linux kernel or scheduler that targeted the heuristics for determining what tasks were dependent on. CFS doesn't even use those heuristics! Some controversy over CFS' inclusion at the time. Other questions?

Sources Images are from Inside the Linux 2.6 Completely Fair Scheduler Sources used overall include the CFS documentation, Completely Fair Scheduler, Inside the Linux Scheduler ,Multiprocessing with the Completely Fair Scheduler, A Study on Linux Kernel Scheduler Version 2.6.32(Thang Ming Le), and Completely Fair Scheduler and its tuning (Jacek Kobus and Rafal Szklarski, 2009)