Scheduling CS623, Lecture 7 3/9/2004

Scheduling CS623, Lecture 7 3/9/2004

2 Reading Materials: Stallings Textbook, Chapter 9 – Background, Fair-Share Scheduler Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible Proportional-Share Resource Management, Proc. of the First Symposium on Operating Systems Design and Implementation (OSDI), 1994. C. A. Waldspurger and W. E. Weihl, "Stride Scheduling: Deterministic Proportional-Share Resource Management," Technical Memorandum MIT/LCS/TM-528, Laboratory for Computer Science, MIT 1995. P. Goyal and X. Guo and H.M. Vin, A Hierarchical CPU Scheduler for Multimedia Operating Systems, Proceedings of 2nd Symposium on, Operating System Design and Implementation (OSDI), 1996.

3 Outline Basics (Stallings 9.3) Fair-Share Scheduler (Stallings 9.3) Lottery Scheduling Stride Scheduling QLinux

4 Short-Term Scheduler Medium-Term Scheduler: swapping Short-Term Scheduler: what to execute next Give small slices of time to processes Some other objectives (fairness and others)

5 Basic Strategies Priorities FCFS RR SPN (shortest process next) SRPT (shortest remaining processing time) HRRN (stretch) (highest response ratio next) Feedback (penalize old guys)

6 Fair-Share Scheduling Traditional techniques treat collection of ready processes as single pool from which to choose the next. – Broken down by priority but otherwise homogeneous. There might be structure to collection of processes not recognized by traditional scheduler. – User might want his set of processes to make progress, not so much one individual one. – Or group of users (department).

7 Fair Share Strategy Each user assigned a weighting that defines user’s share of system resources as fraction of total usage of those resources. – If user A has twice the weighting of user B, in long run should be able to do twice as much work. – Objective of scheduler: monitor usage and give less resources to those that have more than fair share, more to those that have less.

8 FSS G. Henry, Fair-Share Scheduler, 1984. Divide user community into a set of fair-share groups and allocate fraction of processor resource to each group. – Each fair share group can be thought of as a proportionally slower than a full system. Scheduling done on basis of priority: takes into account – Priority of process (High number is lower priority) – Recent processor usage – Recent processor usage of group it belongs to.

9 Fair Share Scheduling See Equations page 420, Stallings. Each process assigned a base priority. Priority of process drops as process uses processor and as the group to which the process belongs uses the processor. In case of group utilization, average is normalized by dividing by the weight of the group. The greater the weight of group, the less its utilization will affect its priority.

10 FSS Processor utilization measured as follows: – Process interrupted 60 times per second – During each interrupt, processor usage field of currently running process is incremented, as is corresponding group processor field. – Once per second, priorities recalculated.

11 Lottery Scheduling: Motivation Policy can have enormous impact on throughput and response time. “Accurate control over quality of service provided to users and applications requires support for specifying relative computation rates.” For interactive applications need ability to do this on a short time-frame.

12 Lottery Scheduling: Problems with Traditional Schedulers Priority Systems are ad-hoc at best, highest priority always wins. Fair Share Schedulers: – Relatively coarse control over long-running computations. – “Algorithms are complex, requiring periodic usage updates, complicated dynamic priority adjustments, administrative parameter setting to ensure fairness on a time scale of minutes.” – Priority inversion.

13 Basics of Lottery Scheduling Randomized Resource Allocation Mechanism – Resource Rights are represented by lottery tickets. – Each allocation determined by holding a lottery; resource granted to client with the winning ticket.

14 Lottery Scheduling: Resource Rights Lottery tickets encapsulate resource rights that are abstract, relative and uniform. – Abstract: quantify resource rights independently of machine details. – Relative: Fraction of resource that they represent varies dynamically in proportion to contention for that resource. – Uniform: Rights for heterogeneous resources can be homogeneously represented as tickets.

15 Lottery Scheduling: Lotteries How fair is lottery scheduling? – Probabilistically fair. Expected allocation of resources to clients is proportional to number of tickets that they hold. – Since scheduling algorithm is randomized, actual allocated proportions not guaranteed to match expceted proportions exactly. – Over the “long term” disparity decreases.

16 Lottery Fairness Number of lotteries won by a client has a binomial distribution. – Probability of winning for client with t out of T tickets: p = t/T. – Expected number of wins in n trials = np. Since any client with a non-zero number of tickets will eventually win a lottery, conventional starvation does not happen. Also operates fairly when number of clients or tickets varies dynamically. – For each allocation, any changes in relative ticket allocations immediately reflected in next allocation decision.

17 Modular Resource Management Tickets are a useful mechanism for modular resource management. – Use to insulate resource management policies of independent modules – Can be transferred Four Techniques; – Transfers – Inflation – Currencies – Compensation tickets

18 Ticket Transfers Explicit transfers of tickets from one client to another. Can be used when a client blocks for some dependency. E.g: Client-Server Example – Server has no tickets of its own. – Clients give server all of their tickets during RPC. – Server’s priority is the sum of the priorities of all its active clients. – Server can use lottery scheduling to give preferential service to high-priority clients. Very elegant solution to long-standing problem.

19 Transfer Can be used to solve priority inversion problem in a manner similar to priority inheritance. Could divide ticket transfers across multiple servers on which they may be waiting.

20 Ticket Inflation Client can bump up its priority by printing money. Only works amongst mutually-trusting clients. Allows clients to adjust their priority dynamically with zero communication.

21 Ticket Currencies Can extend to express resource rights in units that are local to each group of mutually trusting clients. Unique currency within each trust boundary. Set up an exchange rate with the base currency. Enables inflation just within a group. Simplifies mini-lotteries, such as for a mutex.

22 Compensation Tickets What happens if a thread is I/O-bound and blocks before its quantum expires? – Without adjustment, thread will get less than its share of the processor. – If you complete fraction f of the quantum, your tickets are inflated by 1/f until the next time you win. – Example: If B on average uses 1/5 of a quantum its tickets will be inflated 5x and it will win 5 times as often and get its correct share overall.

23 Implementation Issues Need good random number generator Lotteries – Randomly select a winning ticket, search list of clients for winner – Optimization: Order by decreasing ticket counts Tree data structures

24 Experimental Evaluation 60 seconds, 2 tasks, diff ticket ratios. – 10:1 gave 13.42:1 relative rate. As ratio increases randomness less reliable. Dynamically controlled ticket inflation: competing Monte Carlo simulations with early high errors inflate tickets. Client Server

25 Experimental Evaluation Multimedia Applications: – 3 Mpeg_play video viewers. – 3:2:1 – Results: 1.92:1.5:1 – Results distorted by round-robin processing of client requests by single- threaded X11R5 server.

26 Use for Synchronization Resources Contention due to synchronization can substantially affect computation rates. – Lottery Scheduling can help Extended Mach Cthreads library to support a lottery- scheduled mutex type. – Associated mutex_currency and inheritance ticket. All threads that are blocked waiting for mutex perform ticket transfers to fund the mutex currency. Mutex transfers its inheritance ticket to thread which currently holds mutex. THUS: Thread which acquires mutex executes with its own funding plus funding of all waiting threads.

27 Use for Synchronization Resources This solves the priority inversion problem in which a mutex owner with little funding could execute very slowly due to competition with other threads while a highly funded thread remains blocked on the mutex. 2 minute experiment, 2 groups of threads, 2:1. Got 1.8:1. Overall, not as fair as we’d like But simple, elegant, OK

28 Stride Scheduling Basic Idea: Make a deterministic version of lottery scheduling to reduce short-term variability and improve accuracy. Implements proportional-share control over processor time and other resources by applying elements of rate-based flow control algorithms designed for networks.

29 Stride Scheduling Time quanta, tickets Absolute error: Diff between specified and actual number of allocations. Pairwise relative error: absolute error for subsystem containing just those 2. Lottery Scheduling: Expected errors go as sqrt(n). Stride Scheduling: relative error never greater than 1 Absolute error can be O(N) where N is number of clients.

30 Stride Scheduling: Basic Algorithm Mark time virtually using “passes” as the unit as opposed to real seconds. Compute a representation of the time interval – stride – that a client must wait between successive allocations. Client with smallest stride will be scheduled most frequently. A client with half the stride of another will execute twice as quickly.

31 Stride Scheduling: Basic Algorithm Each client has three state variables: – Tickets: Num of tickets. – Stride: Inversely proportional to tickets; represents the interval between selections. – Pass: virtual time index for client’s next selection.

32 How to Allocate a Resource Client with minimum pass is selected and its pass is advanced by its stride. If more than one client has the same minimum pass value, then any of them may be selected. Compensation tickets: increment by f*stride and not stride.

33 Dynamic Client Participation This does not support dynamic changes in the number of clients competing for a resource When clients allowed to leave and join state must be appropriately modified. – Global variables.

34 Problems Relative error good. Absolute error: consider 101 clients with ratio 100:1:…:1 – After 100 steps we wanted 50 units for first job but we got 100. Oops! Hierarchical Stride Scheduling. Aggregates clients to improve interleaving

35 Hierarchical Stride Scheduling Recursive application of basic stride scheduling algorithm. – Individual clients combined into groups with larger aggregate ticket allocations and correspondingly smaller strides. – Allocation performed by invoking normal stride scheduling algorithm first among groups and then among individual clients within groups. – Since often systems consist of small number of high-throughput clients together with a large number of low-throughput clients, helps.

36 A Hierarchical CPU Scheduler for Multimedia Operating Systems Consider requirements imposed by various application classes that can co-exist in a multimedia system: – Hard real-time applications (EDF, RMA). – Soft real-time applications. Need to statistically guarantee QoS parameters such as maximum delay and throughput. E.g. video: Due to multiple time-scale variations, OS will be required to over-book CPU. This may lead to CPU overload. Need some QoS guarantees. Can’t assume know requirements up front. – Best-Effort Applications

37 Bottom Line Need different scheduling algorithms for different application classes in a multimedia system. Need an OS framework that enables different schedulers to be employed for different applications. Need to guarantee not just coexistence but protection between different classes of applications. – For example, overbooking of CPU should not violate hard real-time constraints.

38 Solution Hierarchical Partitioning of CPU Bandwidth – OS should be able to partition the CPU bandwidth among various application classes, and each application class should be able to partition its allocation among subclasses or applications. Hierarchical Partitioning specified by tree. – Each thread belongs to exactly one leaf node – Each node in tree represents either an application class or an aggregation of application classes.

39 Threads are scheduled by leaf node dependent schedulers. Intermediate nodes scheduled by an algorithm that 1. Achieves fair distribution of CPU resource 2. Does not require a priori info about threads’ needs 3. Provides throughput guarantees 4. Computationally efficient.

40 Qlinux: QLinux is a Linux kernel that can provide quality of service guarantees. QLinux, based on the Linux 2.2.x kernel, combines some of the latest innovations in operating systems research. It includes the following features: – Hierarchical Start Time Fair Queuing (H-SFQ) CPU scheduler – Hierarchical Start Time Fair Queuing (H-SFQ) network packet scheduler – Lazy receiver processing (LRP) network subsystem

41 QLinux The H-SFQ CPU scheduler enables hierarchical scheduling of applications by fairly allocating cpu bandwidth to individual applications and application classes. The H-SFQ packet scheduler provides rate guarantees and fair allocation of bandwidth to packets from individual flows as well as flow aggregates (classes). Lazy receiver processing enables accurate charging of TCP/UDP protocol processing overhead (including interrupt processing) to the appropriate process. The Cello disk scheduler supports multiple application classes such as interactive best-effort, throughput-intensive best effort and soft real-time and fairly allocates disk bandwidth to these classes


