Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Similar presentations


Presentation on theme: "Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex."— Presentation transcript:

1 Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex Cheung Nov 13, 2006 ECE1747

2 Outline Motivation Goal Methodology Detection Overload control Experiments Comments

3 Motivation 1.Internet services becoming important to our daily lives: Email News Trading 2.Services becoming more complex Large dynamic content Requires high computation and I/O Hard to predict load requirements of requests 3.Withstand peak load that is 1000x the norm without over-provisioning Solve CNN’s problem on 911

4 Goal Adaptive overload control scheme at node level by maintaining: Response time Throughput QoS & Availability

5 Methodology - Detection 1.Look at the 90 th percentile response time 2.Compare with threshold and decide what to do Weaker alternatives: 100 th percentile: does not capture “shape” of response time curve Throughput: does not capture user perceived performance of the system I ask: What makes 90 th percentile so great? Why not 95 th ? 80 th ? 70 th ? No supporting micro-experiment 12345678910 Requests served Examine 90 th highest response time

6 Methodology – Overload Control If response time is higher than threshold: 1.Limit service rate by rejecting selected requests Extension: Differentiate requests with classes/priorities levels and reject lower class/priority requests first 2.Quality/service degradation 3.Back pressure 1.Queue explosion at 1 st stage (they say) Solved by rejecting requests at 1 st stage 2.Breaks the loose-coupling modular design of SEDA with out-of-band notification scheme (I say)

7 Methodology – Overload Control 4.Forward rejected request to another “more available” server. “more available” – server with the most of a particular resource: CPU, network, I/O, hard disk Make decision using centralized or distributed algorithm Reliable state migration, possibly transactional My take: More complex, interesting, and actually solves CNN’s problem with a cluster of servers!

8 Rate Limit SMOOTHED Multiplicative decrease Additive increase Just like TCP! 10 fine-tuned parameters per stage.

9 Rate Limit With Class/Priority Class/priority assignment based on: IP address, header information, HTTP cookies I ask: Where is the priority assignment module implemented? Should priority assignment be a stage of its own? Is it not shown because complicates the diagram and makes the stage design not “clean”? How to classify which requests are potentially “bottleneck” requests? Application dependent?

10 Quality/Service Degradation Notify application via signal to DO service degradation. Application does service degradation, not SEDA Questions: How is the signaling implemented? Out of band? Is it possible to signal previous stages in the pipeline? Will this SEDA’s loose- coupling design? signal Attach image Send response

11 Experiments

12 Experiments - Setup Arashi email server (realistic experiment) Real access workload Real email content Admission control Web server benchmark Service degradation + 1-class admission control

13 Experiments – Admission Rate Controller response time is not as fast. Additive increase Multiplicative decrease

14 Experiments – Response Time Why?

15 Experiments – Massive Load Spike Not fair! SEDA’s parameters were fine- tuned. Apache can be tuned to stay flat too.

16 Experiments – Service Degradation Service degradation and admission control kick in at roughly the same time

17 Experiments – Service Differentiation Average reject rates without service differentiation: Low-priority: 55.5% High-priority: 57.6% With service differentiation: Low-priority: 87.9% +32.4% High-priority: 48.8% -8.8% Question: Why is the drop rate for high priority request reduced so little with service differentiation? Workload dependent?

18 Comments

19 No idea on what is the controller’s overhead Overload control at node level is not good: Node level is inefficient Late rejection Node level is not user friendly: All session state is gone if you get a reject out of the blues ← comes without warning Need global level overload control scheme Idea/concept is explained in 2.5 pages

20 Comments Rejected requests: Instead of TCP timeout, send static page. (Paper says) this is better (I say) This is worst because it leads to a out-of- memory crash down the road: Saturated output bandwidth Boundless queue at reject handler Parameters: How to tune them? How difficult to tune? May be tedious tuning each stage manually. Given a 1M stage application, need to configure all 1M stage thresholds manually? Automated tuning with control theory? Methodology of adding extensions is not shown in any figures.

21 Comments Experiment is not entirely realistic: Inter-request think time is 20ms realistic? Rejected users have to re-login after 5 min: All state information is gone Frustrated users Two drawbacks of using response time for load detection…

22 Comments 1.No idea which resource is the bottleneck: CPU? I/O? Network? Memory? SEDA can only either: Do admission control Reduces throughput Tell application to degrade overall service

23 Comments CPU I/O Network Memory Resource utilization threshold Default admission control: Attach image Send response Reject requests OVERLOADED!!! … and piss off some users.

24 Comments CPU I/O Network Memory Resource utilization threshold Service degradation WITH bottleneck intelligence: Network is the bottleneck, so expend some CPU and memory to reduce fidelity and size of images to reduce bandwidth consumption WITHOUT reducing admission rate.

25 Comments 2.The response time index is lagging by at least the magnitude of the response time 50 requests come in all at once nreq = 100 timeout = 10s target = 20s Processing time per request = 1s Detects overload after 30s Solution: Compare enqueue VS dequeue rate Overload occurs when enqueue rate > dequeue rate Detects overload after 10s

26 Questions?


Download ppt "Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex."

Similar presentations


Ads by Google