Computer Systems Design

Slides:



Advertisements
Similar presentations
Thrasyvoulos Spyropoulos / Eurecom, Sophia-Antipolis 1  Load-balancing problem  migrate (large) jobs from a busy server to an underloaded.
Advertisements

Operating Systems Process Scheduling (Ch 3.2, )
CPU Scheduling Questions answered in this lecture: What is scheduling vs. allocation? What is preemptive vs. non-preemptive scheduling? What are FCFS,
Scheduling. Main Points Scheduling policy: what to do next, when there are multiple threads ready to run – Or multiple packets to send, or web requests.
Anshul Gandhi (Carnegie Mellon University) Varun Gupta (CMU), Mor Harchol-Balter (CMU) Michael Kozuch (Intel, Pittsburgh)
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 16 Scheduling II.
CSE 691: Energy-Efficient Computing Lecture 4 SCALING: stateless vs. stateful Anshul Gandhi 1307, CS building
NETE4631:Capacity Planning (3)- Private Cloud Lecture 11 Suronapee Phoomvuthisarn, Ph.D. /
Simulation Evaluation of Hybrid SRPT Policies
Maryam Elahi Fairness in Speed Scaling Design Joint work with: Carey Williamson and Philipp Woelfel.
Operating Systems 1 K. Salah Module 2.1: CPU Scheduling Scheduling Types Scheduling Criteria Scheduling Algorithms Performance Evaluation.
1 Mor Harchol-Balter Carnegie Mellon University School of Computer Science.
© Ibrahim Korpeoglu Bilkent University
Operating System Process Scheduling (Ch 4.2, )
Operating System I Process Scheduling. Schedulers F Short-Term –“Which process gets the CPU?” –Fast, since once per 100 ms F Long-Term (batch) –“Which.
System Performance & Scalability i206 Fall 2010 John Chuang.
ECS 152A Acknowledgement: slides from S. Kalyanaraman & B.Sikdar
1 Mor Harchol-Balter Carnegie Mellon University School of Computer Science.
Scheduling in Server Farms
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Effects and Implications of File Size/Service Time Correlation on Web Server Scheduling Policies Dong Lu* + Peter Dinda* Yi Qiao* Huanyuan Sheng* *Northwestern.
Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University.
Cs238 CPU Scheduling Dr. Alan R. Davis. CPU Scheduling The objective of multiprogramming is to have some process running at all times, to maximize CPU.
1 Queueing Theory H Plan: –Introduce basics of Queueing Theory –Define notation and terminology used –Discuss properties of queuing models –Show examples.
1 Connection Scheduling in Web Servers Mor Harchol-Balter School of Computer Science Carnegie Mellon
Performance Evaluation
Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.
Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms
Chapter 4 Assessing and Understanding Performance
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
Carnegie Mellon University Computer Science Department 1 CLASSIFYING SCHEDULING POLICIES WITH RESPECT TO HIGHER MOMENTS OF CONDITIONAL RESPONSE TIME Adam.
Operating Systems Process Scheduling (Ch 4.2, )
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
1 Mor Harchol-Balter Carnegie Mellon University Joint work with Bianca Schroeder.
Operating System Process Scheduling (Ch 4.2, )
Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi,
Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi,
1Chapter 05, Fall 2008 CPU Scheduling The CPU scheduler (sometimes called the dispatcher or short-term scheduler): Selects a process from the ready queue.
Introduction to Queuing Theory
1 Scheduling in Server Farms Mor Harchol-Balter Associate Department Head Computer Science Dept Carnegie Mellon University
CPU Scheduling Chapter 6 Chapter 6.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-3 CPU Scheduling Department of Computer Science and Software Engineering.
1 Mor Harchol-Balter Computer Science Dept, CMU What Analytical Performance Modeling Teaches Us About Computer Systems Design.
1 Mor Harchol-Balter Carnegie Mellon University Computer Science Heavy Tails: Performance Models & Scheduling Disciplines.
Operating Systems Lecture Notes CPU Scheduling Matthew Dailey Some material © Silberschatz, Galvin, and Gagne, 2002.
Scheduling. Alternating Sequence of CPU And I/O Bursts.
CPU Scheduling CSCI 444/544 Operating Systems Fall 2008.
1 Our focus  scheduling a single CPU among all the processes in the system  Key Criteria: Maximize CPU utilization Maximize throughput Minimize waiting.
1 The Effect of Heavy-Tailed Job Size Distributions on System Design Mor Harchol-Balter MIT Laboratory for Computer Science.
Process A program in execution. But, What does it mean to be “in execution”?
yahoo.com SUT-System Level Performance Models yahoo.com SUT-System Level Performance Models8-1 chapter11 Single Queue Systems.
Carnegie Mellon University Computer Science Department 1 OPEN VERSUS CLOSED: A CAUTIONARY TALE Bianca Schroeder Adam Wierman Mor Harchol-Balter Computer.
Lecture 7: Scheduling preemptive/non-preemptive scheduler CPU bursts
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
Operating Systems 1 K. Salah Module 2.2: CPU Scheduling Scheduling Types Scheduling Criteria Scheduling Algorithms Performance Evaluation.
Scheduling. Main Points Scheduling policy: what to do next, when there are multiple threads ready to run – Or multiple packets to send, or web requests.
NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /
Analysis of SRPT Scheduling: Investigating Unfairness Nikhil Bansal (Joint work with Mor Harchol-Balter)
Lecture Topics: 11/15 CPU scheduling: –Scheduling goals and algorithms.
1 Task Assignment with Unknown Duration Mor Harchol-Balter Carnegie Mellon.
1 Mor Harchol-Balter Carnegie Mellon with Nikhil Bansal with Bianca Schroeder with Mukesh Agrawal.
Operating Systems Scheduling. Scheduling Short term scheduler (CPU Scheduler) –Whenever the CPU becomes idle, a process must be selected for execution.
3. CPU Scheduling. CPU Scheduling Process execution – a cycle of CPU execution and I/O wait – figure Preemptive Non-Preemptive.
Lecture 5 Scheduling. Today CPSC Tyson Kendon Updates Assignment 1 Assignment 2 Concept Review Scheduling Processes Concepts Algorithms.
Scheduling.
CPU Scheduling Scheduling processes (or kernel-level threads) onto the cpu is one of the most important OS functions. The cpu is an expensive resource.
Load Balancing and Data centers
Serve Assignment Policies
PA an Coordinated Memory Caching for Parallel Jobs
Admission Control and Request Scheduling in E-Commerce Web Sites
Presentation transcript:

Computer Systems Design What Queueing Theory Teaches Us About Computer Systems Design Mor Harchol-Balter Computer Science Dept, CMU

Outline Basic Vocabulary Single-server queues III.Multi-server queues Avg arrival rate, l Avg service rate, m Avg load, r Avg throughput, X Open vs. closed systems Response time, T Waiting time, TQ Exponential vs. Pareto/Heavy-tailed Squared coefficient of variation, C2 Poisson Process D/D/1, M/M/1, M/G/1 Inspection Paradox Effect of job size variaibility Effect of load Provisioning bathrooms/scaling Scheduling: FCFS, PS, SJF, LAS, SRPT Web server scheduling implementation Open vs. closed systems: wait Open vs. closed systems: scheduling Static load balancing Throwing away servers M/M/k + Comparing architectures Many slow servers vs. 1 fast Capacity provisioning & scaling Square root staffing Dynamic power management Dynamic load balancing/FCFS servers Replication Dynamic load balancing/PS servers

Vocabulary l m sec. Example: Avg. service rate Avg. arrival rate throughout Avg. service rate jobs sec m Avg. arrival rate jobs sec l FCFS Avg service rate Avg size of job on this server: sec. Example: On average, job needs 3x106 cycles Machine executes 9X106 cycles/sec jobs sec

Vocabulary l m Example: arrive Each job requires sec on avg FCFS jobs

More Vocabulary l m 2m Defn: Throughput X denotes the average rate at which jobs complete (jobs/sec) QUESTION: Which has higher throughput, C? jobs sec m l 2m throughout Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

More Vocabulary C: l m avg rate at which jobs complete sec m l C: (assuming no jobs dropped) Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Open versus Closed Systems Interactive Closed Batch Open MPL N: fixed #users Z: “think time” MPL N: fixed #jobs l explain where the jobs are in each picture.

More Vocabulary l m response time queueing time (waiting time) jobs sec m jobs sec l response time queueing time (waiting time) Ask how T and TQ are related. T = TQ + S Q: Given that l < m, what causes wait? A: Variability in the arrival process & service requirements

Variability l m Variability Variability in arrival in job size, S sec m jobs sec l Variability in arrival process Ask how T and TQ are related. T = TQ + S Variability in job size, S

Job Size Distributions “Most jobs are small; few jobs are large” 1 ½ ¼ 2 3 4 5 6 7 8 1 ½ ¼ 2 3 4 5 6 7 8 9 heavy tail Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Job Size Distributions QUESTION: Which best represents UNIX process lifetimes? QUESTION: For which do top 1% of jobs comprise 50% of load? QUESTION: Which distribution fits the saying, “the longer a job has run so far, the longer it is expected to continue to run.” 1 ½ ¼ 2 3 4 5 6 7 8 1 ½ ¼ 2 3 4 5 6 7 8 9 Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. heavy tail DFR

Pareto Job Size Distribution Pareto job sizes are ubiquitous in CS: CPU lifetimes of UNIX jobs [Harchol-Balter, Downey 96] Supercomputing job sizes [Schroeder, Harchol-Balter 00] Web file sizes [Crovella, Bestavros 98], [Barford, Crovella 98] IP flow durations [Shaikh, Rexford, Shin 99] Wireless call durations [Blinn, Henderson, Kotz 05] Also ubiquitous in nature: Forest fire damage Earthquake damage Human wealth [Vilfredo Pareto ‘65] 1 ½ ¼ 2 3 4 5 6 7 8 9 heavy tail

S is time until coin w/prob md Exponential Job Size Distribution md md md md md md md md d 2d 3d 4d 5d 6d 7d 8d 9d time S is time until coin w/prob md comes up heads S is memoryless!

Variability in Job Sizes Squared Coefficient of Variation = QUESTION: Match these distributions to their C2 values: Deterministic Exponential Uniform(0,b) Unix process lifetimes Human IQs Pareto distribution Explain how C^2 is indpt of scale, so variability doesn’t go up when compare 1,2,3 with 10, 20, 30.

Variability in Job Sizes Squared Coefficient of Variation Deterministic Human IQs Uniform(0,b) – for any b Exponential distribution Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. Unix process lifetimes Pareto distribution

Variability l m Variability Variability in arrival in job size, S sec m jobs sec l Variability in arrival process Ask how T and TQ are related. T = TQ + S Variability in job size, S

Poisson Process with rate l QUESTION: What’s a Poisson process with rate l? Hint: It’s related to Exp(l).

Poisson Process with rate l Poisson process models sequence of arrival times (typically representing aggregation of many users) d 2d 3d 4d 5d 6d 7d 8d 9d time

Summary Part I Basic Vocabulary Prize-winning messages  Avg arrival rate, l Avg service rate, m Avg load, r Avg throughput, X Open vs. closed systems Response time, T Waiting time, TQ Exponential vs. Pareto/Heavy-tailed Squared coefficient of variation, C2 Poisson Process Prize-winning messages  Throughput is very different for open vs. closed systems An Exponential distribution is the time to get a single “head.” A Poisson process is a sequence of “heads.” Heavy-tailed, Pareto distributions: * represent real workloads * very high variability & DFR * top 1% comprise half the load Variance in job sizes is key. C2: measure of variance. 19

Outline Basic Vocabulary Single-server queues III.Multi-server queues Avg arrival rate, l Avg service rate, m Avg load, r Avg throughput, X Open vs. closed systems Response time, T Waiting time, TQ Exponential vs. Pareto/Heavy-tailed Squared coefficient of variation, C2 Poisson Process D/D/1, M/M/1, M/G/1 Inspection Paradox Effect of job size variaibility Effect of load Provisioning bathrooms/scaling Scheduling: FCFS, PS, SJF, LAS, SRPT Web server scheduling implementation Open vs. closed systems: wait Open vs. closed systems: scheduling Static load balancing Throwing away servers M/M/k + Comparing architectures Many slow servers vs. 1 fast Capacity provisioning & scaling Square root staffing Dynamic power management Dynamic load balancing/FCFS servers Replication Dynamic load balancing/PS servers

Single-Server Queue l m D/D/1 M/M/1 M/G/1 M=“memoryless”=“Markovian” jobs sec m l D/D/1 Deterministic service times M/M/1 Exponential inter-arrival times service 1 server M/G/1 General service times M=“memoryless”=“Markovian”

Single-Server Queue jobs sec m l D/D/1 M/M/1 M/G/1 Q: Does low  low ?

Single-Server Queue l m M/M/1 M/G/1 D/D/1 C2: variability related to jobs sec m l M/M/1 M/G/1 D/D/1 related to C2: variability job size Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Single-Server Queue l m low load does NOT imply low wait 20 15 10 5 jobs sec m l M/G/1 C2 = 100 M/G/1 C2 = 10 20 15 10 5 Tell consulting story low load does NOT imply low wait M/M/1 D/D/1 0.2 0.4 0.6 0.8

M/G/1 Where is this coming from? Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Waiting for the bus Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Waiting for the bus S: time between buses time QUESTION: Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. QUESTION: On average, how long do I have to wait for a bus? < 5 min 5 min 10 min >10 min

Waiting for the bus S: time between buses Wait “Inspection Paradox” It’s a paradox because it doesn’t make sense that the wait should be longer than E[S]. Reason you have this wait is that you’re much more likely to arrive in a big S than a small one. “Inspection Paradox”

Back to Single-Server Queue jobs sec m l Same paradox comes up in waiting time for M/G/1. This red term represents remaining time on job in service. Normally you would think that the job in service has a remaining time of E[S]/2. But the inspec paradox says you’re much more likely to arrive when a really big job is in service, so it’s remaining time is bigger than E[S]/2. Transition: So we’ve seen how waiting time is related to job size variability. But it’s also related to load (rho), and in a non-linear way. Let’s examine that now.

Waiting for the Loo Check out the line for the men’s room … Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. Check out the line for the men’s room …

Waiting for the Loo On avg, Women spend 88 sec in loo. On avg, Men spend 40 sec in loo. Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Waiting for the Loo l m l 2m M/M/1 model r doubling QUESTION: ppl sec m l M/M/1 model doubling r ppl sec 2m l Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. QUESTION: Women take 2X as long. What’s the difference in their wait? factor < 2 factor 2 factor 4 factor > 4

Doubling r can increase E[TQ] Waiting for the Loo M/M/1 M/G/1 Doubling r can increase E[TQ] by factor of 4 to ∞ For M/G/1, it’s factor of 1 to infty (1-rho/2)/(1 – rho). For M/M/1, think: rho/(1-rho) * E[S] = women rho/2/(1-rho/2) * E[S/2] = men

Equalizing the wait for men & women ppl sec m 2 Women’s rooms for each Men’s room. ppl sec 2m Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. QUESTION: Is this (a) insufficient (b) overkill (c) just right

Equalizing the wait for men & women ppl sec m Insufficient! Waiting time for women is still factor of 2 higher. ppl sec m ppl sec 2m Would be sufficient if everyone arrived at once to the bathroom. Also, more than sufficient (overkill) if put women in M/M/2 – single queue. Also true under M/G/1 model. For what models is this not true?

M/G/1 To drop load, we can increase server speed. High load leads to high wait High job size variability leads to high wait Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. To drop load, we can increase server speed. Q: What can we do to combat job size variability? A: Smarter scheduling!

Scheduling in M/G/1 l m QUESTION: ½ ¼ jobs sec m l DFR QUESTION: Which scheduling policy is best for minimizing E[T]? FCFS (First-Come-First-Served, non-preemptive) PS (Processor-Sharing, preemptive) SJF (Shortest-Job-First, non-preemptive) SRPT (Shortest-Remaining-Processing-Time, preemptive) LAS (Least-Attained-Service First, preemptive) Might know LAS from ATLAS – HPCA 2010 -- with Yoongu Kim and Onur Mutlu – Memory controllers prioritize threads to favor those that have obtained least service so far. [Harchol-Balter EORMS 2011]

Scheduling in M/G/1 l m E[T] r E[T] r C2=100 C2=10 1 3 5 7 9 0.2 0.4 ½ ¼ jobs sec m l FCFS SJF PS LAS SRPT 1 3 5 7 9 0.2 0.4 0.6 0.8 1.0 E[T] r FCFS SJF PS LAS SRPT 1 3 5 7 9 0.2 0.4 0.6 0.8 1.0 E[T] r C2=100 Describe all policies & why they do what they do. At end ask about PLCFS (if time). C2=10

Scheduling in M/G/1 l m E[T] We saw: r But isn’t SRPT unfair jobs sec m l PS SRPT E[T] r We saw: Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. But isn’t SRPT unfair to large jobs, when compared to PS?

Unfairness Question Let S ~ Bounded Pareto Let r = 0.9 1010 PS ? SRPT ½ ¼ Let S ~ Bounded Pareto with max = 1010 Let r = 0.9 PS M/G/1 Mr.Max 1010 QUESTION: Which queue does Mr. Max prefer? ? Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. SRPT M/G/1

Unfairness Question Let S ~ Bounded Pareto (a = 1.1) Let r = 0.9 1010 ½ ¼ Let S ~ Bounded Pareto (a = 1.1) with max = 1010 Let r = 0.9 PS Mr.Max SRPT 1010 I SRPT Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Unfairness Question All-can-win-theorem: Defies Kleinrock’s Conservation Law All-can-win-theorem: [Bansal, Harchol-Balter, Sigmetrics 2001] Under M/G/1, for all job size distributions, if r < 0.5, E[T(x)]SRPT < E[T(x)]PS for all job size x. For heavy-tailed distributions, holds for r < 0.95. Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Scheduling in the Real World Traditional web servers use PS (Fair) scheduling. Let’s do SRPT scheduling instead! [Harchol-Balter et al. TOCS 2003] WEB SERVER client 1 client 2 “Get File” client 1000 Internet Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. Q: What is being scheduled? Q: How is size used? 5

SRPT Scheduling for Web Servers Q: What is being scheduled? A: Bottleneck device is limited ISP bandwidth. Q: How is size being used? A: S = Size of requests = Size of file ~ Pareto(a = 1) Site buys limited fraction of ISP’s bandwidth (say 100Mbps) client 1 client 2 “Get File” client 1000 rest of Internet Apache Linux ISP Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. bottleneck Schedule the sharing of this 100Mbps among 1000 clients. 5

Linux Implementation S 1st M 2nd 3rd L socket 1 x-mit queue netwk Sockets take turns draining: PS netwk card bottleneck x-mit queue socket 1 socket 1000 socket 2 Apache Linux socket 1 S netwk card 1st Apache Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. M socket 2 2nd bottleneck 3rd L Linux Socket of file w/smallest remaining data feeds first: SRPT socket 1000 priority queues.

Mean response time results PS SRPT 0.4 0.6 0.8 1.0 r

Response time as fcn of Size E[T(x)] 1.0s 10-1s 10-2s 10-3s PS SRPT 20% 40% 60% 80% 100% Percentile of Request Size percentile of job size x

Caution: Open versus Closed MPL N: fixed #users Z: think time Response Time: T l Open Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. QUESTION: When run with same load r, which has higher E[T]? (a) Open (b) Closed (c) Same

Caution: Open versus Closed MPL N: fixed #users Z: think time Response Time: T l Open Performance of Auction Site [Schroeder, Wierman, Harchol-Balter NSDI 2006] 0.2 0.4 0.6 0.8 1.0 r E[T] (ms) 10-1 1 10 102 Open MPL 1000 MPL 100 MPL 10 Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. E[T] much lower for closed system w/ same r

[Schroeder, Wierman, Harchol-Balter, NSDI 06] Caution: Open versus Closed Closed Open l r E[T] PS SRPT r E[T] PS SRPT Closed & open systems run w/ same job size distribution and same load. [Schroeder, Wierman, Harchol-Balter, NSDI 06]

Summary Part II II. Single-server queues Prize-winning messages  D/D/1, M/M/1, M/G/1 Inspection Paradox Effect of job size variaibility Effect of load Provisioning bathrooms/scaling Scheduling: FCFS, PS, SJF, LAS, SRPT Web server scheduling implementation Open vs. closed systems: wait Open vs. closed systems: scheduling Prize-winning messages  Policies that seem unfair may not be. M/G/1: Low load does NOT always imply low waiting time. Waiting time has non-linear relationship to load. Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. Closed systems behave very differently from open. “Inspection paradox” Waiting time is affected by variability in job size. Smart scheduling can combat job size variability. 51

Outline Basic Vocabulary Single-server queues III.Multi-server queues Avg arrival rate, l Avg service rate, m Avg load, r Avg throughput, X Open vs. closed systems Response time, T Waiting time, TQ Exponential vs. Pareto/Heavy-tailed Squared coefficient of variation, C2 Poisson Process D/D/1, M/M/1, M/G/1 Inspection Paradox Effect of job size variaibility Effect of load Provisioning bathrooms/scaling Scheduling: FCFS, PS, SJF, LAS, SRPT Web server scheduling implementation Open vs. closed systems: wait Open vs. closed systems: scheduling Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. Static load balancing Throwing away servers M/M/k + Comparing architectures Many slow servers vs. 1 fast Capacity provisioning & scaling Square root staffing Dynamic power management Dynamic load balancing/FCFS servers Replication Dynamic load balancing/PS servers

Load Balancing l m=2 p 1-p m=1 Poisson process with rate jobs sec l Poisson process with rate m=1 p 1-p QUESTION: What is the optimal p to minimize E[T]? (a) (b) (c) Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Load Balancing l m=2 p 1-p m=1 l Opt p Poisson process with rate p=1 jobs sec l Poisson process with rate m=1 p 1-p Opt p 1.0 0.9 0.8 0.7 2.0 3.0 p=1 throw away slower server Might mention that for closed systems want to balance load. l

M/M/k l m k servers Poisson process with rate jobs sec l Poisson process with rate k servers Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. Central queue. Server takes job when free. Job size

3 Architectures m m km Q: Which is best for minimizing E[T]? M/M/k Splitting M/M/1fast m jobs sec l m jobs sec l km jobs sec l Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. Q: Which is best for minimizing E[T]?

? 3 Architectures m m km M/M/k l Splitting M/M/1fast l l jobs jobs sec l Splitting M/M/1fast m jobs sec l km jobs sec l ? Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Many slow or 1 fast? vs. m km M/M/k M/M/1fast l l jobs sec l km jobs sec l vs. Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. QUESTION: Which is best for minimizing E[T]?

Many slow or 1 fast? E[T] l M/M/10 M/M/1fast 2.0 1.0 2 4 6 8 10 0.5 0.6 0.2 2.0 3.0 M/M/1fast M/M/10 E[T] l 2.0 1.0 2 4 6 8 10 0.5 1.5 Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Many slow or 1 fast: Revisited M/G/k M/G/1fast m jobs sec l jobs sec l vs. km Why? M/G/1 suffers from high variability in job sizes, while M/G/k gets around that. QUESTION: Which is best for minimizing E[T]?

Many slow or 1 fast: Revisited M/G/10 M/G/1fast 2 4 6 5 10 E[T] l Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Probability an arrival Capacity Provisioning & Scaling Consider the following example: 12 servers m=1 jobs sec l=9 Poisson process with rate Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. Probability an arrival has to queue

Capacity Provisioning & Scaling QUESTION: If arrival rate becomes 106 times higher, how many servers do we need to keep PQ the same? (a) 9.1 × 106 (d) 12 × 106 (b) 10 × 106 (e) 13 × 106 (c) 11 × 106 (f) none m=1 jobs sec Poisson Proc. ? servers Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk….

Proportional Scaling is Overkill M/M/4 m 4l m 2l M/M/2 M/M/8 m 8l PQ 1.0 0.6 0.2 l 0.4 0.8 M/M/2 M/M/4 M/M/8 Mu = 1 throughout here, so lambda can be at most 1. E[TQ] l M/M/2 M/M/4 M/M/8 0.6 0.4 0.2 0.8 1.0

Proportional Scaling is Overkill M/M/4 m 4l m 2l M/M/2 M/M/8 m 8l Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. More servers at same system load  lower PQ  lower E[TQ] high r  high E[TQ], given enough servers

Back to Capacity Provisioning m=1 9 12 servers jobs sec 9x106 PQ Q: How many servers? A: 9.003x106 servers “Square root staffing” [Halfin, Whitt OR 1981] Let R be the minimum #servers for stability. Then servers yields PQ = 20%. CLT: Stochastic variation decreases with more samples. Transition: These scaling ideas are important in saving power. Don’t need as many servers as you think you need. Can save further power by turning servers off when not in use. Let’s see what scaling says about that … Lesson: SAVE MONEY: Don’t scale proportionately!

Dynamic Power Management Annual U.S. data center energy consumption: 100B kWh Unfortunately most is wasted… Servers are only busy 5-30% time on average, but they’re left ON, wasting power. [Gartner Report] [NYTimes] Setup time 260s 200W BUSY server: 200 Watts IDLE server: 140 Watts OFF server: 0 Watts Intel Xeon E5520 2 quad-core 2.27 GHz 16 GB memory Q: Given setup time, does dynamic power mgmt work?

M/M/1/setup model l m Thm: [Welch ’64] for larger (M/M/k) systems? Response Time: T jobs sec m l When server is idle, immediately shuts off. Requires setup time to get it back on. Thm: [Welch ’64] This adds 260s to response time! Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. QUESTION: Does setup have same effect for larger (M/M/k) systems?

Effect of setup in larger systems We will scale up system size, while keep load fixed. m jobs sec lk M/M/k/setup k servers #servers, k 100 80 60 40 20 Setup matters less as k increases. Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. This is why dynamic power mgmt works! [Gandhi, Doroudi, Harchol-Balter, Scheller-Wolf Sigmetrics 2013]

Dynamic Power Mgmt Implementation [Gandhi, Harchol-Balter, Raghunathan, Kozuch TOCS 2012] 28 Application servers Load Balancer Power Aware Load Balancer Unknown 500 GB DB Just finished talking about dynamic power mgmt. Let’s talk about dynamic load balancing. So far static – don’t look at queues. 7 Memcached key-value workload mix of CPU & I/O 1 job = 1 to 3000 KV pairs 120ms total on avg SLA: T95 < 500 ms Setup time: 260 s

AlwaysOn AutoScale Facebook has adopted AutoScale Within 30% of OPT Time (min)  Num. servers  AlwaysOn T95=291ms, Pavg=2,323W _ load o kbusy+idle x kbusy+idle+setup AutoScale T95=491ms, Pavg=1,297W _ load o kbusy+idle x kbusy+idle+setup Num. servers  Time (min)  Here are the results on ON/OFF shown again, and here is the comparison with AutoScale. The first thing that you should notice about AutoScale is that it fluctuates a lot less with changes in load. Notice these nice straight lines. It doesn’t try to rush to turn servers off and then turn them on again. This is due to organically waiting for server to go idle and then waiting further. Everything is much smoother. The second thing that you notice about AutoScale is how it turns servers on. Using number of jobs, rather than the arrival rate, to determine how many servers we need is a much better predictor. You see that the number of servers we have on (the blue circles) ends up much higher, so that we can better respond to the load. As a result the 95%-tile is a lot better, and we meet our desired response time SLA. Note that we still save a huge amount of power compared to AlwaysOn, in fact more than 40% of power is saved. Facebook: Qiang Wu, Goranka Bjedov, Eitan Frachtenberg, Jason Taylor, Sanjeev Kumar Within 30% of OPT power on all our traces! Facebook has adopted AutoScale

Dynamic Load Balancing FCFS Dispatcher Response time, T F5 Big-IP Microsoft SharePoint Cisco Local Director Coyote Point Equalizer IBM Network Dispatcher etc. QUESTION: What is a good dispatching policy for minimizing E[T]? Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. All hosts identical. Jobs i.i.d. with highly variable size distrib.

Dynamic Load Balancing Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. Least-Work-Left Go to host with least total work. Central-Queue (M/G/k) Host grabs next job when free. 5. Size-Interval Splitting Jobs are split up by size among hosts. Response time, T FCFS Dispatcher Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. All hosts identical. Jobs i.i.d. with highly variable size distrib. [Harchol-Balter, Crovella, Murta JPDC 99], [Harchol-Balter JACM 02], [Harchol-Balter,Scheller-Wolf,Young SIGMETRICS 09]

Dynamic Load Balancing High E[T] Low generally Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. Least-Work-Left Go to host with least total work. Central-Queue (M/G/k) Host grabs next job when free. 5. Size-Interval Splitting Jobs are split up by size among hosts. Response time, T FCFS Dispatcher Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. All hosts identical. Jobs i.i.d. with highly variable size distrib. [Harchol-Balter, Crovella, Murta JPDC 99], [Harchol-Balter JACM 02], [Harchol-Balter,Scheller-Wolf,Young SIGMETRICS 09]

Dynamic Load Balancing High E[T] Low generally Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. Least-Work-Left Go to host with least total work. Central-Queue (M/G/k) Host grabs next job when free. 5. Size-Interval Splitting Jobs are split up by size among hosts. Central-Queue: + Good utilization of servers. + Some isolation for smalls Size-Interval Task Assignment - Worse utilization of servers. + Great isolation for smalls! Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. [Harchol-Balter, Crovella, Murta JPDC 99], [Harchol-Balter JACM 02], [Harchol-Balter,Scheller-Wolf,Young SIGMETRICS 09]

Newest work: Don’t Decide. Send to all! And we each get in line. Whenever one of us is at the head of the queue, we join up again.

Newest work: Don’t Decide. Send to all! Replicate! And we each get in line. Whenever one of us is at the head of the queue, we join up again. Microsoft/Berkeley Dolly System 2012 [Ananthanarayanan, Ghodsi, Shenker, Stoica] Google “Tail at Scale” 2013 [Dean, Barroso] Berkeley Sparrow paper 2013 [Ousterhout et al.] DNS and Database query systems 2013 [Vulimiri et al.] CMU first exact analysis of replication SIGMETRICS 2015 [Gardner et al.]

Dynamic Load Balancing 2 Response time, T HTTP Web requests:  immediately dispatched to server Commodity servers used:  do Processor-Sharing PS Dispatcher QUESTION: What is a good dispatching policy for minimizing E[T]? Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. All hosts identical. Jobs i.i.d. with highly variable size distrib.

Dynamic Load Balancing 2 High E[T]FCFS Low Response time, T Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. Least-Work-Left Go to host with least total work. 4. Size-Interval Splitting Jobs are split up by size among hosts. PS Dispatcher Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. All hosts identical. Jobs i.i.d. with highly variable size distrib.

Dynamic Load Balancing 2 High E[T]FCFS Low Response time, T Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. Least-Work-Left Go to host with least total work. 4. Size-Interval Splitting Jobs are split up by size among hosts. PS Dispatcher Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. QUESTION: What is the best of these for PS server farms?

Dynamic Load Balancing 2 High E[T]FCFS Low Response time, T Round-Robin 2. Join-Shortest-Queue Go to host w/ fewest # jobs. Least-Work-Left Go to host with least total work. 4. Size-Interval Splitting Jobs are split up by size among hosts. PS Dispatcher Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. QUESTION: What is the best of these for PS server farms? [Gupta,Harchol-Balter,Sigman,Whitt Performance 07]

Not covering: Networks of Queues + Closed-form analysis exists - Requires Poisson arrivals & indpt Exponential service times + Routes can depend on packet’s “class.” FCFS + Closed-form analysis exists - Requires Poisson arrivals. + General service times! + Routes and service rates can depend on packet’s class. PS PS Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. PS

Summary Part III III. Multi-server queues Prize-winning messages  vs. Static load balancing Throwing away servers M/M/k + Comparing architectures Many slow servers vs. 1 fast Capacity provisioning & scaling Square root staffing Dynamic power management Dynamic load balancing/FCFS servers Replication Dynamic load balancing/PS servers Prize-winning messages  vs. Best choice depends on job size variability. . Dynamic power mgmt works because setup time (and high load) hurt less in large systems. UNbalancing load is best! Throw away slow servers. Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. Proportional scaling is overkill! Square-root staffing. Best dispatching policies aim to mitigate effect of job size variability.

THANK YOU! www.cs.cmu.edu/~harchol/ 84 Hi, My name is Mor Harchol-Balter. I’m a professor in the Computer Science Department here at CMU. I’ll be talking about what analytical performance modeling teaches us about designing computer systems. Don’t get nervous about these friends of mine … you’ll meet them all later during the talk…. 84