Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mor Harchol-Balter Carnegie Mellon University Joint work with Bianca Schroeder.

Similar presentations


Presentation on theme: "1 Mor Harchol-Balter Carnegie Mellon University Joint work with Bianca Schroeder."— Presentation transcript:

1 1 Mor Harchol-Balter Carnegie Mellon University Joint work with Bianca Schroeder

2 2 “size” = service requirement load  < 1 Q: Which minimizes mean response time?

3 3 “size” = service requirement jobs SRPT jobs load  < 1 jobs PS FCFS Q: Which best represents scheduling in web servers ?

4 4 IDEA How about using SRPT instead of PS in web servers? Linux 0.S. WEB SERVER (Apache) client 1 client 2 client 3 “Get File 1” “Get File 2” “Get File 3” Internet

5 5 Many servers receive mostly static web requests. “GET FILE” For static web requests, know file size Approx. know service requirement of request. Immediate Objections 1) Can’t assume known job size 2) But the big jobs will starve...

6 6 Outline of Talk 1) “Analysis of SRPT Scheduling: Investigating Unfairness” 2) “Size-based Scheduling to Improve Web Performance” 3) “Web servers under overload: How scheduling can help” THEORY IMPLEMENT www.cs.cmu.edu/~harchol/

7 7 THEORY SRPT has a long history... 1966 Schrage & Miller derive M/G/1/SRPT response time: 1968 Schrage proves optimality 1979 Pechinkin & Solovyev & Yashkov generalize 1990 Schassberger derives distribution on queue length BUT WHAT DOES IT ALL MEAN?

8 8 THEORY SRPT has a long history (cont.) 1990 - 97 7-year long study at Univ. of Aachen under Schreiber SRPT WINS BIG ON MEAN! 1998, 1999 Slowdown for SRPT under adversary: Rajmohan, Gehrke, Muthukrishnan, Rajaraman, Shaheen, Bender, Chakrabarti, etc. SRPT STARVES BIG JOBS! Various o.s. books: Silberschatz, Stallings, Tannenbaum: Warn about starvation of big jobs... Kleinrock’s Conservation Law: “Preferential treatment given to one class of customers is afforded at the expense of other customers.”

9 9 Unfairness Question SRPT PS ? ? Let  =0.9. Let G: Bounded Pareto(  = 1.1, max=10 10 ) Question: Which queue does biggest job prefer?

10 10 THEORY Our Analytical Results (M/G/1): SRPT PS All-Can-Win Theorem: Under workloads with heavy-tailed (HT) property, ALL jobs, including the very biggest, prefer SRPT to PS, provided load not too close to 1. Almost-All-Win-Big Theorem: Under workloads with HT property, 99% of all jobs perform orders of magnitude better under SRPT. ISRPT Counter-intuitive!

11 11 Berkeley Unix process CPU lifetimes [HD96] Fraction of jobs with CPU duration > x Duration ( x secs) log-log plot Pr{Life > x } = 1 x What’s Heavy-Tail?

12 12 What’s the Heavy-Tail property? 20, ~ } { Pr     xxX Defn: heavy-tailed distribution: Many real-world workloads well-modeled by truncated HT distribution. Key property: HT Property: “Largest 1% of jobs comprise half the load.”

13 13 THEORY Our Analytical Results (M/G/1): SRPT PS All-Can-Win Theorem: Under workloads with heavy-tailed (HT) property, ALL jobs, including the very biggest, prefer SRPT to PS, provided load not too close to 1. Almost-All-Win-Big Theorem: Under workloads with HT property, 99% of all jobs perform orders of magnitude better under SRPT. ISRPT Counter-intuitive!

14 14 THEORY Our Analytical Results (M/G/1): All-distributions-win-thm: If load <.5, for every job size distribution, ALL jobs prefer SRPT to PS. Bounding-the-damage Theorem: For any load, for every job size distribution, for every size x, PSSRPT xTExTE)]([1 ([             

15 15 What does SRPT mean within a Web server? Many devices: Where to do the scheduling? No longer one job at a time. IMPLEMENT From theory to practice:

16 16 Server’s Performance Bottleneck IMPLEMENT 5 Linux 0.S. WEB SERVER (Apache) client 1 client 2 client 3 “Get File 1” “Get File 2” “Get File 3” Rest of Internet ISP Site buys limited fraction of ISP’s bandwidth We model bottleneck by limiting bandwidth on server’s uplink.

17 17 Network/O.S. insides of traditional Web server Sockets take turns draining --- FAIR = PS. Web Server Socket 1 Socket 3 Socket 2 Network Card Client1 Client3 Client2 BOTTLENECK IMPLEMENT

18 18 Network/O.S. insides of our improved Web server Socket corresponding to file with smallest remaining data gets to feed first. Web Server Socket 1 Socket 3 Socket 2 Network Card Client1 Client3 Client2 priority queues. 1 st 2 nd 3 rd S M L BOTTLENECK IMPLEMENT

19 19 Experimental Setup Implementation SRPT-based scheduling: 1) Modifications to Linux O.S.: 6 priority Levels 2) Modifications to Apache Web server 3) Priority algorithm design. Linux 0.S. 1 2 3 APACHE WEB SERVER Linux 1 2 3 200 Linux 1 2 3 200 Linux 1 2 3 200 switch WAN EMU

20 20 Experimental Setup APACHE WEB SERVER Linux 0.S. 1 2 3 Linux 1 2 3 200 Linux 1 2 3 200 Linux 1 2 3 200 switch WAN EMU Trace-based workload: Number requests made: 1,000,000 Size of file requested: 41B -- 2 MB Distribution of file sizes requested has HT property. Flash Apache WAN EMU Geographically- dispersed clients 10Mbps uplink 100Mbps uplink Surge Trace-based Open system Partly-open Load < 1 Transient overload + Other effects: initial RTO; user abort/reload; persistent connections, etc.

21 21 Preliminary Comments Job throughput, byte throughput, and bandwidth utilization were same under SRPT and FAIR scheduling. Same set of requests complete. No additional CPU overhead under SRPT scheduling. Network was bottleneck in all experiments. APACHE WEB SERVER Linux 0.S. 1 2 3 Linux 1 2 3 200 Linux 1 2 3 200 Linux 1 2 3 200 switch WAN EMU

22 22 Load FAIR SRPT Mean Response Time (sec) Results: Mean Response Time......

23 23 FAIR SRPT Load Mean Slowdown Results: Mean Slowdown

24 24 Percentile of Request Size Mean Response time (  s) FAIR SRPT Load =0.8 Mean Response Time vs. Size Percentile

25 25 SRPT scheduling yields significant improvements in Mean Response Time at the server. Negligible starvation. No CPU overhead. No drop in throughput. Summary so far...

26 26 More questions … So far only showed LAN results. Are the effects of SRPT in a WAN as strong? So far only showed load < 1. What happens under SRPT vs. FAIR when the server runs under transient overload ? -> new analysis -> implementation study

27 27 WAN EMU results Propagation delay has additive effect. Reduces improvement factor. FAIR SRPT

28 28 WAN EMU results Loss has quadratic effect. Reduces improvement factor a lot. FAIR SRPT

29 29 WAN results Geographically-dispersed clients Load 0.9Load 0.7

30 30 Zzzzzzz zzz... Person under overload Overload – 5 minute overview

31 31 Q: What happens under overload? A: Buildup in number of connections. FAIR SRPT Q: What happens to response time?

32 32 Web server under overload Clients SYN-queue When reach SYN-queue limit, server drops all connection requests. Server SYN-queueACK-queue Apache-processes

33 33 Transient Overload      

34 34 Transient Overload - Baseline Mean response time SRPT FAIR

35 35 Transient overload Response time as function of job size small jobs win big! big jobs aren’t hurt! FAIR SRPT WHY?

36 36 Baseline Case WAN propagation delays WAN loss Persistent Connections Initial RTO value SYN Cookies User Abort/Reload Packet Length Realistic Scenario WAN loss + delay RTT: 0 – 150 ms Loss: 0 – 15% RTT: 0 – 150 ms, 0 – 10 requests/conn. RTO = 0.5 sec – 3 sec ON/OFF Abort after 3 – 15 sec, with 2,4,6,8 retries. Packet length = 536 – 1500 Bytes RTT = 100 ms; Loss = 5%; 5 requests/conn., RTO = 3 sec; pkt len = 1500B; User aborts After 7 sec and retries up to 3 times. FACTORS

37 37 Transient Overload - Realistic Mean response time FAIR SRPT

38 38  SRPT scheduling is a promising solution for reducing mean response time seen by clients, particularly when the load at server bottleneck is high.  SRPT results in negligible or zero unfairness to large requests.  SRPT is easy to implement.  Results corroborated via implementation and analysis. Conclusion


Download ppt "1 Mor Harchol-Balter Carnegie Mellon University Joint work with Bianca Schroeder."

Similar presentations


Ads by Google