Scheduling in Server Farms Mor Harchol-Balter Computer Science Dept Carnegie Mellon University harchol@cs.cmu.edu Tutorial talk. Assume no knowledge. Try to stay away from lots of math & long proofs. Provide lots of intuition instead, but can ask me for proofs after talk or via email. Present known results, very new results, unpublished results and open problems. You’re here to learn, so my main goal is to keep this interactive.
Outline & = today = tomorrow Review of scheduling in single-server FCFS Router Supercomputing Web server farm model PS Router Talk about how I work in computer systems. Single queue is abstraction of an operating system, where any scheduling policy is possible. Other applications, the scheduling is fixed, and the routing is free. Explain that the goal is scheduling/routing. Explain how certain applications restrict you to certain scheduling policies at the servers. So now have to find best routing policy to go with it. IV. Towards Optimality … SRPT Router & Metric: Mean Response Time, E[T]
Single Server Model (M/G/1) Poisson arrival process w/rate l Load r = lE[X]<1 X: job size (service requirement) 1 ½ ¼ Bounded Pareto CPU Lifetimes of UNIX jobs [Harchol-Balter, Downey 96] Supercomputing job sizes [Schroeder, Harchol-Balter 00] Web file sizes [Crovella, Bestavros 98, Barford, Crovella 98] IP Flow durations [Shaikh, Rexford, Shin 99] Job sizes with huge variance are everywhere in CS: Explain load clearly – 3 jobs per second, E[X] = 1/4 , implies load ¾. Huge Variability D.F.R. Top-heavy: top 1% jobs make up half load
Scheduling Single Server (M/G/1) Poisson arrival process Load r <1 Huge Variance Question: Order these scheduling policies for mean response time, E[T]: FCFS (First-Come-First-Served, non-preemptive) PS (Processor-Sharing, preemptive) SJF (Shortest-Job-First, a.k.a., SPT, non-preemptive) SRPT (Shortest-Remaining-Processing-Time, preemptive) LAS (Least Attained Service, a.k.a., FB, preemptive) Explain each policy one at a time and explain why one might care. FCFS – many systems where preemption not allowed are run this way. Simple. PS – standard for computer systems. Define. Eg. Web server SJF – Favor short jobs. Non-preemptive. SRPT – Favor job that will finish soonest. Preemptive LAS – Sometimes, want to do SRPT, but don’t know size of job. E.g. Ernst Biersack – scheduling flows. Mention that they should assume rho = 0.8 (because SJF starts getting weird at super high loads). Do it on your piece of scratch paper.
Scheduling Single Server (M/G/1) Poisson arrival process Load r <1 Huge Variance LOW E[T] HIGH E[T] SRPT < LAS < PS < SJF < FCFS Requires D.F.R. [Righter, Shanthikumar89] OPT for all arrival sequences [Schrage 67] Insensitive to E[X2] Surprisingly bad: (E[X2] term) ~E[X2] (shorts caught behind longs) No “Starvation!” Even the biggest jobs prefer SRPT to PS: [Bansal, Harchol-Balter 01], [Wierman, Harchol-Balter 03]: THM: E[T(x)]SRPT < E[T(x)]PS for all x, for Bounded Pareto, r < .9.
Effect of Variability r E[T] FCFS SJF 24 LAS LAS FCFS SJF 20 LAS FCFS 16 LAS FCFS SJF 12 FCFS SJF LAS 8 E[T] PS SRPT r C2 = Bounded Pareto job sizes
What’s the effect of scheduling? Closed vs. Open Systems Open System QUESTION: What’s the effect of scheduling? Closed System Send Receive Think Compare under same avg. load! OPEN: Arrival process is indpt of past departure. CLOSED: Arrival times of new requests are very much dept on completions. Explain that closed system occurs with databases and that database benchmarks are closed
[Schroeder, Wierman, Harchol-Balter, NSDI 06] Closed vs. Open Systems r Open System Results E[T] FCFS SRPT r Closed System Results E[T] FCFS SRPT Careful to talk about #users, and not say MPL. NSDI: systems -- refer listeners to this NSDI paper. Why? Argument 1: Think about zero think time. In closed system, for any work-conserving scheduling policy, looking over a long period of time, T, the same set of jobs complete, except for the possibly initial jobs in the queue. You can look at a policy that biases towards shorts. It finishes off shorts, they get replaced with other jobs, looks like it’s racing very fast. But eventually stuck having to clear out those longs from the queue before can get more shorts to do. So with zero think time, all w.c. policies are equal. When add think time, get a little wiggle room, but just a little add too much think time and load goes to zero so all same again. Argument 2: Remember how for open system variability of job size distribution matters? This is because we can have potentially unlimited number of small mice stuck behind big elephant. In closed systems, we limit the number of mice that can be affected by the elephant– only N-1. So effect of variability is limited. Under low variability, FCFS and SRPT are a lot closer! Closed & open systems analyzed under same job size distribution, with same average load. [Schroeder, Wierman, Harchol-Balter, NSDI 06]
Summary Single-Server X: job size -- highly-variable r <1 Single-server system LESSONS LEARNED: Smart scheduling greatly improves mean response time. Variability of job size distribution is key. Closed system sees much reduced effect.
(Sometimes scheduling policy is fixed – legacy system) Multiserver Model Server farms: + Cheap + Scalable capacity Sched. policy Routing (assignment) policy Incoming jobs: Poisson Process Sched. policy Router Sched. policy 2 Policy Decisions (Sometimes scheduling policy is fixed – legacy system)
Outline & Review of scheduling in single-server Supercomputing FCFS Router Supercomputing Web server farm model PS Router We will now look at 3 different multiserver models. The first is supercomputing, where the sched policy is fixed to be FCFS. IV. Towards Optimality … SRPT Router & Metric: Mean Response Time, E[T]
Supercomputing Model Jobs are not preemptible. FCFS Router Routing (assignment) policy Poisson Process Jobs are not preemptible. Explain that I’ve studied supercomputing jobs and they have Pareto distribution. Explain why supercomputing fits this model. Each of these servers is actually a multiprocessor machine. Operating systems for multiprocessor machines can’t time-share. They can’t run more than one job at once. Stuck with FCFS. Jobs processed in FCFS order. Assume hosts are identical. Jobs i.i.d. ~ G: highly variable size distribution. Size may or may not be known. Initially assume known.
Q: Compare Routing Policies for E[T]? FCFS Router Routing policy Poisson Process Jobs i.i.d. ~ G: highly variable Supercomputing Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. 3. Least-Work-Left, equivalent to M/G/k/FCFS Go to host with least total work. 4. Central-Queue-Shortest-Job (M/G/k/SJF) Host grabs shortest job when free. 5. Size-Interval Splitting Jobs are split up by size among hosts.
A: Size-Interval Splitting: best so far High E[T] Low FCFS Router Routing policy Highly variable job sizes Supercomputing Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. 3. Least-Work-Left, equivalent to M/G/k/FCFS Go to host with least total work. 4. Central-Queue-Shortest-Job (M/G/k/SJF) Host grabs shortest job when free. 5. Size-Interval Splitting Jobs are split up by size among hosts. [Harchol-Balter, Crovella, Murta, JPDC 99]
Routing Policies: Remarks Central-Queue: + Good utilization of servers. + Some isolation for smalls High E[T] Low Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. 3. Least-Work-Left, equivalent to M/G/k/FCFS Go to host with least total work. 4. Central-Queue-Shortest-Job (M/G/k/SJF) Host grabs shortest job when free. 5. Size-Interval Splitting Jobs are split up by size among hosts. Size-Interval WAY Better! - Worse utilization of servers. + Great isolation for smalls! Explain why CQSJ didn’t make it. Shorts blocked by longs – non-preemptive. [Harchol-Balter, Crovella, Murta, JPDC 99].
Size-Interval Splitting job size x x f × ( ) M L Size- Interval Routing XL Question: How to choose the size cutoffs? “To Balance Load or Not to Balance Load?” ? ? ?
Size-Interval Splitting x f × ( ) L S Size- Interval Routing FCFS s job size x Answer: Recent Research for case of Bounded Pareto job size: Pr{X>x} ~ x-a Fine to say that I don’t really understand why this happens. Explain that when alpha < 1 there are so many smalls, that we want to help them. Say: more variable less variable. As alpha grows there are fewer smalls, so bigs count more in the average. Magic point seems to come up in analysis. a<1 a=1 a>1 UNBALANCE favor smalls BALANCE LOAD UNBALANCE favor larges [Harchol-Balter,Vesilo, 06+], [Glynn, Harchol-Balter, Ramanan, 06+]
Beyond Size-Interval Splitting x f × ( ) L S Size- Interval Routing FCFS s job size x Q: Is Size-Interval Splitting as good as it gets?
Size-Interval Splitting with Stealing Answer: Allow Cycle Stealing! L S Size- Interval Routing with Cycle Stealing FCFS Send Shorts Here Send Longs Here. But, if idle, send Short. Gain to Shorts is high Pain to Longs is very small. Coupled processor problem is similar to cycle stealing, except here 2 servers each work on their own class of job and if either is idle, it can help the other, increasing the *rate* of the other processor. This help incurs no switching cost and has a benefit even if only 1 job is present. Coupled processor model incurs no multiserver benefit under highly variable job sizes. Most common by far is truncating the chain. Early work on coupled-processor is by Fayolle, Iasnogorodski, Kohneim, Meilijson Melkman. Solve either Dirichlet problem or homogeneous Riemann-Hilbert problem for a circle, depending on the accelerated rates of the server. Results in complex integrals. While it is possible to evaluate these results, they didn’t. Cohen and Boxma are first to extend to general service times. Studying not response time, but rather stationary workload. They formulate Wiener-Hopf boundary value problem. End up with integrals or finite sums. Simplifies a little if queues are symmetric. More recent work, Borst, Boxma, van Uitert look at asymptotic behavior of workload related to service distribution – make discovery about how one server can be isolated from another when not in overload. Really beautiful work. Conclusion: Difficult to get response time number for non-exponential workloads. Lots of work on tail asymptotics and heavy traffic. Bob Foley at Informs 03 was working on exactly our problem, but in the heavy-traffic regime. Cycle Stealing analysis very hard: Fayolle, Iasnogorodski, Konheim, Meilijson, Melkman, Cohen, Boxma, van Uitert, Jelenkovic, Foley, McDonald, Harrison, Borst, Williams … New easy approach: Dimensionality Reduction 2D1D [Harchol-Balter, Osogami, Scheller-Wolf, Squillante SPAA03]
What if Don’t Know Job Size? FCFS Router Routing policy Highly variable job sizes Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. 3. Least-Work-Left, equivalent to M/G/k/FCFS Go to host with least total work. 4. Central-Queue-Shortest-Job (M/G/k/SJF) Host grabs shortest job when free. 5. Size-Interval Splitting Jobs are split up by size among hosts. Q: What can we do to minimize E[T] when don’t know job size?
“Task Assignment by Guessing Size” The TAGS algorithm “Task Assignment by Guessing Size” s Host 1 m Host 2 Outside Arrivals Host 3 Explain where used – Microsoft example, UNIX example, supercomputing centers actually do this if you can’t predict the runtime of your job, although they do it preemptively. Answer: When job reaches size limit for host, then it is killed and restarted from scratch at next host. [Harchol-Balter, JACM 02]
Results of Analysis Random Least-Work-Left TAGS High variability Lower 2 hosts only, system load = 0.5. JSQ is in between LWL and Random. Mean job size = 3000. High variability Lower variability
Summary – Part I X: job size -- highly-variable r <1 Single-server system LESSONS LEARNED: Smart scheduling greatly improves mean response time. Variability of job size distribution is key. Closed system sees much reduced effect. FCFS Router Supercomputing LESSONS LEARNED: Greedy routing policies, like JSQ, LWL are poor. To combat variability, need size-interval splitting. By isolating smalls, can achieve effects of smart single-server policies Load UN-balancing Don’t need to know size. Similarities & Differences: Variance is a big deal throughout. For single-server, combat variance by using smart scheduling policy that lets small get ahead of bigs. For supercomputing server farm, stuck with FCFS. However can combat variance by giving smalls isolation from bigs.
Tomorrow … & I. Review of scheduling in single-server M/GI/1 FCFS Router Supercomputing/Manufacturing Web server farm model PS Router Homework, think about how the routing policies that we discussed for supercomputing will far in the web server farm model. Also, think about the case where you are allowed to design your own server farm. How would you design it? IV. Towards Optimality … SRPT Router & Homework
Scheduling in Multiserver Systems PART II Mor Harchol-Balter Computer Science Dept Carnegie Mellon University harchol@cs.cmu.edu
Outline & Review of scheduling in single-server Supercomputing Web server farm model FCFS Router PS Router For those of you who missed day 1, we began by talking about scheduling in single server, before moving on to multiserver. Started with a discussion of CS workloads – common property: very high variability in job size distribution. In single-server, saw that scheduling policies like FCFS and other non-preemptive policies have poor mean response time performance because small jobs get stuck waiting behind large jobs. Saw that preemptive policies that favor short jobs perform much better, because shorts not stuck behind larges. Moved on to multiserver systems. Depending on application, the scheduling policy for the multiserver system is often fixed. E.g., in supercomputing we’re stuck with FCFS at the servers. In Web server farm we have Processor-Sharing at servers. For supercomputing case, we found that, since we were stuck with FCFS at servers, it was particularly important to find a routing policy that helped smalls not be stuck behind bigs. That was size-interval assignment. Split smalls and bigs. Provide isolation for smalls. Other greedy policies, like JSQ, that don’t provide isolation for smalls weren’t very good under high variability workloads. This talk, we’ll move on to looking at Web server farm model. Here have PS servers. Want to understand what routing policy is best in this setting. IV. Towards Optimality … SRPT Router SRPT &
Outline & Review of scheduling in single-server Supercomputing Web server farm model FCFS Router PS Router IV. Towards Optimality … SRPT Router SRPT &
Web Server Farm Model Router Routing policy Poisson Process PS Cisco Local Director IBM Network Dispatcher Microsoft SharePoint F5 Labs BIG/IP HTTP requests are immediately dispatched to server. Requests are fully preemptible. Commodity servers utilized Do Processor-Sharing. Jobs i.i.d. ~ G: highly variable size distribution, 7 orders magnitude difference in job size [Crovella, Bestavros 98].
(Central-Queue policies aren’t Q: Compare Routing Policies for E[T]? High E[T]FCFS Low Web Server Farm PS Router Random 2. Join-Shortest-Queue Go to host w/ fewest # jobs. 3. Least-Work-Left Go to host with least total work. 4. Size-Interval Splitting Jobs are split up by size among hosts. High variance job size ? ? I’ve replaced Round-Robin by Random, just to make it easier to think about. (Central-Queue policies aren’t possible for PS farms)
Q: Compare Routing Policies for E[T]? PS Router Random 2. Join-Shortest-Queue Go to host w/ fewest # jobs. 3. Least-Work-Left Go to host with least total work. 4. Size-Interval Splitting Jobs are split up by size among hosts. Answer: Shortest- Queue is greedier & better. Answer: Same for E[T], but not great. Also, want to balance load! E[T] JSQ LWL RAND SIZE 8 servers, r = .9, C2=50 1 & 4 same. Pause. Start by observing that 1 & 4 are actually both probabilistic routing policies. For Random, route p = ½ jobs to 1st queue and 1-p fraction to other. For Size-Interval, router p_s jobs to first queue and 1-p_s to other. Hence Poisson Process is split probabilistically, so have Poisson stream into each queue. Now show formulas. Observe probabilities cancel, so doesn’t matter which probabilistic policy you use. Example: one queue with job of size 100 vs. 50 jobs of size 1. In fact, s-q can be a ** lot** better when have high variability job size distribution, as we have. So it’s definitely interesting to study Shortest-Queue.
All prior JSQ analysis assumes FCFS servers Prior Analysis of JSQ Routing All prior JSQ analysis assumes FCFS servers FCFS JSQ 2-server: [Kingman 61] , [Flatto, McKean 77], [Wessels, Adan, Zijm 91] [Foschini, Salz 78], [Knessl, Makkowsky, Schuss, Tier 87] [Conolly 84], [Rao, Posner 87], [Blanc 87], [Grassmann 80], [Muntz, Lui, Towsley 95] [Cohen, Boxma 83] >2-server approximations: [Nelson, Philips, Sigmetrics 89] [Nelson, Philips, Perf.Eval. 93] [Lin, Raghavendra, TPDS 96] Our PS model only studied in simulation – no analysis. Bonomi – Transactions on Computers 1990 – first to consider our model. Simulation only. 2 server system only -- Simulations weren’t so good then… couldn’t cover range of distributions and too difficult to do more than 2 servers. Analysis of PS very hard. State space analysis means you need to know remaining service requirements of all jobs. All analysis done in FCFS. Note that FCFS = PS wrt tracking number of jobs if job size distribution is exponential. However, for highly-variable jobs, as we’ve seen, JSQ is not a particularly good policy. Nevertheless, that’s where the analysis is. Even FCFS server farm hard. 2D infinite. Kingman grouping – use generating functions to derive the joint probability distrib of queue lengths and express the mean response time as an infinite sum, which in practice requires truncation to compute. Adan makes the additional observation that limiting probalities are geometrically decreasing, allowing him to obtain nearly tight upper and lower bounds on the sum, via the compensation method. Foschini grouping – heavy traffic Conolly group – Computational w/truncation of state space JSQ might be best available to you. Cohen – Boundary Value Approach to obtain exact functional representation for mean response time. Methods aren’t computationally feasible for k>2. Exact integral expressions! Very few even attempt > 2 servers. Nelson – explain soon – very coarse, but works – approximate total number of jobs by M/M/k and then assume jobs are distributed between queues within +-1 Lin paper – also coarse approximation – similar idea: Approximated number of busy servers by binomial and then assume jobs are distributed between queues within +-1.
[Nelson, Philips] Idea p JSQ ú û ê ë k n FCFS Assume this is: k M n / k: #servers ú û ê ë k n
First Analysis of JSQ for PS [Gupta, Harchol-Balter, Sigman, Whitt, 06+] JSQ Poisson Process PS Near insensitivity to C2: PS server farm with General service PS server farm w/Exponential service FCFS server farm w/Exponential service After finishing first result, could apply approximation of others that works for many servers. However, we find a far more accurate approach for many servers. Single-queue equivalence: For PS server farm w/Exponential service, Multiserver system Single queue w/ contingent arrival rates
Summary so far LESSONS LEARNED: Greedy routing policies, like JSQ, LWL are poor. To combat variability, need size-interval splitting. By isolating smalls, can achieve effects of smart single-server policies Load UN-balancing Don’t need to know size. FCFS Router Supercomputing PS Router Web server farm LESSONS LEARNED: JSQ routing is good! Job size variability not a problem. Load Balancing
Outline & I. Review of scheduling in single-server M/GI/1 I. Review of scheduling in single-server Supercomputing/Manufacturing Web server farm model FCFS Router PS Router PS Stop and talk about how before scheduling policy was forced on us. Now up to us. IV. Towards Optimality … SRPT Router SRPT &
What is Optimal Routing/Scheduling? Sched. policy Routing policy Sched. policy Incoming jobs Router Sched. policy 2 Policy Decisions Let people think here about what they may want to do. As them what was optimal for a single server & why – gets jobs done as fast as possible. Assume no restrictions: Jobs are fully preemptible. Can have central queue if want it, or not. Know job size (of course don’t know future jobs ...)
What is Optimal Routing/Scheduling? Central-Queue-SRPT SRPT SRPT Recall: minimizes E[T] on every sample path! [Schrage 67] Question: Central-Queue-SRPT looks pretty good! Does it minimize E[T]?
Central-Queue-SRPT OPT Central-Queue-SRPT Answer: This does not minimize E[T] on every arrival sequence. Bad Arrival Sequence: @time 0: 2 jobs size 29, 1 job size 210 @time 210: 2 jobs size 28, 1 job size 29 @time 210 + 29: 2 jobs size 27, 1 job size 28, etc. Mention intuition for why you’re doing badly in SRPT – poor binpacking because greedy. After doing this slide, go through an explanation of competitive analysis. This way, ready for next slide. OPT Central-Queue-SRPT 28 28 29 29 28 29 29 210 29 28 210 29 preempted
Central-Queue-SRPT Adversarial (Worst-Case) Guarantees: THM: [Leonardi, Raz, STOC 97]: Central-Queue-SRPT is competitive for E[T], and no online policy can improve upon this by more than constant factor. Theoretical computer scientists would say that this policy is optimal. But that still doesn’t tell us anything about its response time. Explain what big O notation means. Explain that second term can be ignored because with infinite number of job arrivals, the first term is all that will matter. This theorem more importantly says that central –queue SRPT is as close to optimal as you can get from a competitive ratio perspective. Problem is that it still doesn’t tell us what the mean response time is under this architecture in a stochastic setting, w/General i.i.d. service times. I want to be able to compare E[T] under this setting to what I had under PS server farms to decide whether it’s worth rebuilding my web site to run this way! Explain that our QUESTA paper is a numerical technique for analyzing the stochastic setup with m classes of jobs. It doesn’t provide a nice simple formula or approximation formula. Lots of research still needed on the stochastic front. Note: THM [Muthukrishnan, Rajaraman, Shaheen, Gehrke, FOCS 99] Central-Queue-SRPT is 14-competitive for E[Slowdown]. Remarks: log(biggest/smallest) could be factor 7 in practice! Closest stochastic result analyzes only central-queue w/priorities: [Harchol-Balter, Wierman, Osogami, Scheller-Wolf, QUESTA 05]
What is Optimal Routing/Scheduling with Immediate Dispatch? Sched. policy Router Routing policy Incoming jobs 2 Policy Decisions Practical Assumption: jobs must be immediately dispatched! Jobs are fully preemptible within queue. Know job size.
What is Optimal Routing/Scheduling when Immediately Dispatch? Router SRPT Immediately Dispatch Jobs Incoming jobs Claim: The optimal routing/sched. pair given immed. dispatch uses SRPT at the hosts. (Assuming an opt pair exists.) PROOF: Let A: optimal routing/scheduling pair wrt E[T]. Suppose by contradiction: A does not use SRPT at the hosts. Let policy pair B mimic A with respect to A’s dispatching of jobs to hosts. I.e., policy B may be different from A, but sends the same jobs to the same hosts at the same times as A. But after the dispatching, B does SRPT scheduling at the hosts. Thus B improves upon A with respect to E[T]. Contradiction! JSQ dispatching, where if equal, then alternate. IMPACT: Claim narrow search to policies with SRPT at hosts.
Immediate Dispatch Routing In search of good Immediate Dispatch Routing SRPT Immediately Dispatch Jobs SRPT Incoming jobs Router SRPT Pause first. Then Explain intuition: want to make sure that there are plenty of smalls at all hosts, and plenty of mediums, etc. Q: What should immediate dispatch routing policy be, given SRPT sched. at hosts?
Immediate Dispatch Routing In search of good Immediate Dispatch Routing … why not obvious Bad Arrival Sequence: @time 0: job of size 10 arrives. @time 0+: job of size 1000 arrives. @time 0++: job of size 10 arrives. @time 0+++: job of size 1 arrives. @time 1+++: job of size 1000 arrives. SRPT Immediately Dispatch Jobs JSQ SRPT Explain that routing policy I’m choosing to use is JSQ, where if equal, then alternate. Pause before the 1+++ and explain first what’s going to happen. Show the 1 disappearing slowly. SRPT at the servers can cause bad things to happen… OPT JSQ/SRPT 1000 10 10 10 10 1 1000 1000 1 1000
Smart Immediate Dispatch Policy Router SRPT Immediately Dispatch Incoming jobs Answer: IMD Algorithm due to [Avrahami,Azar 03]: Split jobs into size classes Assign each incoming job to server w/ fewest #jobs in that class Explain IMD performance is as good as Central-Queue-SRPT because 2nd term is infinite for us anyway and goes away. Again, doesn’t tell us what E[T] looks like. Remarks: IMD is competitive for E[T]. Immediate Dispatching is “as good as” Central-Queue-SRPT Similar policy proposed by [Wu,Down 06] for heavy-traffic setting.
Some Key Points & Supercomputing Web server farm model FCFS Router PS Router Need Size-interval splitting to combat job size variability and enable good performance. Job size variability is not an issue. Greedy, JSQ, performs well. Both these have similar worst-case E[T]. Almost exclusively worst-case analysis, so hard to compare with above results. Need stochastic research here! Towards Optimality … Thank you for inviting me! SRPT Router SRPT &
If you want to know more … My class lectures are all available online. 15-849 Performance Modeling ** Highly-recommended for CS theory, Math, TEPPER, and ACO doctoral students Instructor: Mor Harchol-Balter (harchol@cs.cmu.edu) www.cs.cmu.edu/~harchol/ Queueing theory is an old area of mathematics which has recently become very hot. The goal of queueing theory has always been to improve the design/performance of systems, e.g. networks, servers, memory, disks, distributed systems, etc., by finding smarter schemes for allocating resources to jobs. In this class we will study the beautiful mathematical techniques used in queueing theory, including stochastic analysis, discrete-time and continuous-time Markov chains, renewal theory, product-forms, transforms, supplementary random variables, fluid theory, scheduling theory, matrix-analytic methods, and more. Throughout we will emphasize realistic workloads, in particular heavy-tailed workloads. This course is packed with open problems -- problems which if solved are not just interesting theoretically, but which have huge applicability to the design of computer systems today.
References N. Avrahami and Y. Azar, “Minimizing Total flow time and total completion time with immediate dispatching.” SPAA 2003, pp. 11-18. N. Bansal and M. Harchol-Balter, "Analysis of SRPT scheduling: Investigating unfairness," Proceedings of ACM Sigmetrics 2001. P. Barford and M. Crovella, “Generating representative web workloads for network and server performance evaluation,” ACM Sigmetrics 1998, pp. 151-160. J. Blanc, “A note on waiting times in systems with queues in parallel,” J. Appl. Prob., Vol. 24, 1987 pp 540-546. S. Borst, O. Boxma, and P. Jelenkovic, “Reduced load equivalence and induced burstiness in GPS queues with long-tailed traffic flows,” Queueing Systems, Vol. 43, 2003, pp. 274-285. S. Borst, O. Boxma, and M. van Uitert, “The asymptotic workload behavior of two coupled queues,” Queueing Systems, Vol. 43, 2003, pp. 81-102. J.W. Cohen and O. Boxma, Boundary Value Problems in Queueing System Analysis, North Holland, 1983 B. Conolly, “The Autostrada queueing problem,” J. Appl. Prob.: Vol. 21., 1984, pp. 394-403.
References, cont. M. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: evidence and possible causes,” Proceedings of the 1996 ACM Sigmetrics International Conference on Measurement and Modeling of Computer Systems, May 1996, pp. 160-169. D. Down and R. Wu, “Multi-layered round robin scheduling for parallel servers,” Queueing Systems: Theory and Applications, Vol. 53, No. 4, 2006, pp. 177-188. G. Fayole and R. Iasnogorodski, “Two coupled processors: the reduction to a Riemann-Hilbert problem,” Zeitschrift fur Wahrscheinlichkeistheorie und vervandte Gebiete, vol. 47, 1979, pp. 325-351. L. Flatto and H.P. McKean, “Two queues in parallel,” Communication on Pure and Applied Mathematics, Vol. 30, 1977, pp. 255-263. R. Foley and D. McDonald, “Exact asymptotics of a queueing network with a cross-trained server,” Proceedings of INFORMS Annual Meeting, October 2003, pp. MD-062. G. Foschini and J. Salz, “A basic dynamic routing problem and diffusion,” IEEE Transactions on Communications, Vol. Com-26, No. 3, March 1978.
References, cont. P. Glynn, M. Harchol-Balter, K. Ramanan, “Heavy-traffic approach to optimizing size-interval task assignment,” Work in progress, 2006. W. Grassmann, "Transient and steady state results for two parallel queues," Omega, vol. 8, 1980, pp. 105-112. V. Gupta, M. Harchol-Balter, K. Sigman, and W. Whitt, “Analysis of join-the-shortest-queue policy for web server farms.” In submission, 2006. M. Harchol-Balter and A. Downey. "Exploiting process lifetime distributions for dynamic load balancing," Proceedings of ACM Sigmetrics '96 Conference on Measurement and Modeling of Computer Systems , May 1996, pp. 13-24. M. Harchol-Balter, M. Crovella, and C. Murta, "On choosing a task assignment policy for a distributed server system," Journal of Parallel and Distributed Computing , vol. 59, no. 2, Nov. 1999, pp. 204-228. M. Harchol-Balter, C. Li, T. Osogami, and A. Scheller-Wolf, and M. Squillante, “Cycle stealing under immediate dispatch task assignment,” Proceedings of the Annual ACM Symposium on Parellel Algorithms and Architectures (SPAA), June 2003, pp. 274-285.
References, cont. M. Harchol-Balter, B. Schroeder, N. Bansal, M. Agrawal. "Size-based scheduling to improve web performance." ACM Transactions on Computer Systems , Vol. 21, No. 2, May 2003, pp. 207-233. M. Harchol-Balter and R.Vesilo, “Optimal cutoffs for size-interval task assignment,” Work in progress, 2006. M. Harchol-Balter, A. Wierman, T. Osogami, and A. Scheller-Wolf, "Multi-server queueing systems with multiple priority classes," Queueing Systems: Theory and Applications (QUESTA), vol. 51, no. 3-4, 2005, pp. 331-360. J. Kingman, “Two similar queues in parallel,” Biometrika, Vol. 48, 1961, pp. 1316-1323. A. Konheim, I. Meilijson, and A. Melkman, “Processor-sharing of two parallel lines,” J. Appl. Prob., Vol. 18, 1981, pp. 952-956. C. Knessl, B. Matkowsky, Z. Schuss, and C. Tier, “Two parallel M/G/1 queues where arrivals join the system with the smaller buffer content,” IEEE Transactions on Communications, Vol. Com-35, No. 11,1987, pp. 1153-1158. S. Leonardi and D. Raz, “Approximating total flow time on parallel machines,” ACM Symposium on Theory of Computing (STOC), 1997.
References, cont. H. Lin, and C. Raghavendra, “An approximate analysis of the join the shortest queue (JSQ) policy”, IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 3, March 1996. J. Lui, R. Muntz, D. Towsley, “Bounding the mean response time of the minimum expected delay routing policy: an algorithmic approach,” IEEE Transactions on Computers, Vol. 44, No. 12, Dec 1995. S. Muthukrishnan, R. Rajaraman, A. Shaheen, and J. Gehrke, “Online scheduling to minimize average stretch,” Proceedings of the 40th Annual Symposium on Foundations of Computer Science, October 1999, pp. 433. R. Nelson and T. Philips, “An approximation to the response time for shortest queue routing,” ACM SIGMETRICS Performance Evaluation Review, Vol. 17 No. 1, May 1989, pp. 181-189. R. Nelson and T. Philips, “An approximation for the mean response time for shortest queue routing with general interarrival and service times,” Performance Evaluation, Vol. 17 No. 2, March 1993 pp. 123-139.
References, cont. T. Osogami, M. Harchol-Balter, and A. Scheller-Wolf, “Analysis of cycle stealing with switching cost,” Proceedings of the ACM Sigmetrics, June 2003, pp. 184-195. T. Osogami, M. Harchol-Balter, and A. Scheller-Wolf. "Analysis of cycle stealing with switching times and thresholds" Performance Evaluation, Vol. 61, No. 4, 2005, pp. 347-369. B. Rao and M. Posner, “Algorithmic and approximate analysis of the shorter queue,” Model Naval Research Logistics, Vol. 34, 1987, pp. 381-398. R. Righter and J. Shanthikumar, “Scheduling multiclass single server queueing systems to stochastically maximize the number of successful departures," Probability in the Engineering and Informational Sciences, Vol. 3, 1989, pp. 323-333. L.E. Schrage, “A proof of the optimality of the shortest processing remaining time discipline,” Operations Research, Vol. 16, 1968, pp. 678-690. B. Schroeder and M. Harchol-Balter, "Evaluation of task assignment policies for supercomputing servers: The case for load unbalancing and fairness," 9th IEEE Symposium on High Performance Distributed Computing (HPDC '00) , August 2000.
References, cont. B. Schroeder, A. Wierman, and M. Harchol-Balter. "Closed versus open system models: A cautionary tale,” Proceedings of NSDI , 2006. A. Shaikh, J. Rexford, and K. Shin, “Load-sensitive routing of long-lived IP flows,” Proceedings of SIGCOMM, September, 1999. J. Wessels, I. Adan, and W. Zijm, “Analysis of the asymmetric shortest queue problem,” Queueing Systems, Vol. 8, 1991, pp. 1-58. A. Wierman and M. Harchol-Balter. "Classifying scheduling policies with respect to higher moments of conditional response time." Proceedings of ACM Sigmetrics 2005 Conference on Measurement and Modeling of Computer Systems. A. Wierman and M. Harchol-Balter. "Nearly insensitive bounds on SMART scheduling." Proceedings of ACM Sigmetrics 2005 Conference on Measurement and Modeling of Computer Systems. A. Wierman and M. Harchol-Balter, "Classifying scheduling policies with respect to unfairness in an M/GI/1," Proceedings of ACM Sigmetrics 2003 Conference on Measurement and Modeling of Computer Systems , June 2003.