Presentation is loading. Please wait.

Presentation is loading. Please wait.

U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Resource Management in Internet Hosting Platforms Ph.D. Thesis Defense.

Similar presentations


Presentation on theme: "U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Resource Management in Internet Hosting Platforms Ph.D. Thesis Defense."— Presentation transcript:

1 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Resource Management in Internet Hosting Platforms Ph.D. Thesis Defense Bhuvan Urgaonkar Advisor: Prashant Shenoy

2 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 2 Internet Applications  Proliferation of Internet applications auction siteonline gameonline retail store  Growing significance in personal, business affairs  Focus: Internet server applications

3 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 3 Hosting Platforms  Data Centers  Clusters of servers  Storage devices  High-speed interconnect  Hosting platforms:  Rent resources to third-party applications  Performance guarantees in return for revenue  Benefits:  Applications: don’t need to maintain their own infrastructure o Rent server resources, possibly on demand  Platform provider: generates revenue by renting resources

4 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 4 Goals of a Hosting Platform  Meet service-level agreements  Satisfy application performance guarantees o E.g., average response time, throughput  Maximize revenue  E.g., maximize the number of hosted applications Question: How should a hosting platform manage its resources to meet these goals?

5 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 5 Challenge #1: Dynamic Workloads  Multi-time-scale variations  Time-of-day, hour-of-day  Overloads  E.g., Flash crowds  User threshold for response time: 8-10 s  Key issue: How to provide good response time under varying workloads? 0 20000 40000 60000 80000 100000 120000 140000 05101520 Time (hrs) Request Rate (req/min) 0 12 24 Time (hours) Time (days) 0 12345 Arrivals per min 0 0 140K 1200

6 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 6 Challenge #2: Complexity of Applications  Complex software architecture  Diverse software components  Web servers, Java application servers, databases  Multiple classes of clients  How to provide differentiated service?  Replicable components  How many replicas to have?  Tunable configuration parameters  E.g., MaxClient in Apache  How to set these parameters?  Key issue: How to capture all this complexity?

7 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 7 Talk Outline Motivation Thesis contributions  Application modeling  Dynamic provisioning  Scalable request policing  Conclusions

8 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 8 Hosting Platform Models  Small applications  Require only a fraction of a server  Shared Web hosting, $20/month to run own Web site  Shared hosting: multiple applications on a server  Co-located applications compete for server resources

9 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 9 Hosting Platform Models  Large applications  May span multiple servers  eBay site uses thousands of servers!  Dedicated hosting: at most one application per server  Allocation at the granularity of a single server

10 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 10 Thesis Contributions Dynamic resource management in hosting platforms Shared Hosting  Statistical multiplexing and under-provisioning [OSDI 2002]  Application placement [PDCS 2004] Dedicated Hosting  Analytical model for an Internet application [SIGMETRICS 2005]  Dynamic provisioning [Autonomic Computing 2005]  Scalable request policing [PODC 2004, WWW 2005]

11 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 11 Talk Outline Motivation Thesis contributions Application modeling  Dynamic provisioning  Scalable request policing  Conclusions

12 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 12 Internet Application Architecture  Multi-tier architecture  Each tier uses services provided by its successor  Session-based workloads HTTPJ2EEDatabase request processing in an online bookstore search “moby” queries response Melville’s ‘Moby Dick’ Music CDs by Moby

13 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 13 Baseline Application Model  Model consists of two components  Sub-system to capture behavior of clients  Sub-system to capture request processing inside the application SIGMETRICS’05 clientsapplication

14 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 14 Modeling Clients  Clients think between successive requests  Infinite server system to capture think time Z  Captures independence of Z from processing in application Client 1 Client 2 Client N Z Z Z Q0Q0 applicationclients

15 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 15 Modeling Request Processing Q1Q1 Q2Q2 QMQM tier 1tier 2tier M p M =1p3p3 p1p1 p2p2 S1S1 S2S2 SMSM  Transitions defined to capture circulation of requests  Request may move to next queue or previous queue  Multiple requests are processed concurrently at tiers  Processor sharing scheduling discipline  Caching effects get captured implicitly! N

16 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 16 Putting It All Together Q0Q0 Q1Q1 Q2Q2 QMQM p M =1p3p3 p1p1 p2p2 Z Z S1S1 S2S2 SMSM N  A closed-queuing model that captures a given number of simultaneous sessions being served tier 1tier 2tier M client

17 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 17 Mean-value Analysis Q0Q0 Q1Q1 Q2Q2 QMQM  Product-form closed queuing network  L m : average length of Q m  A m : average number of clients in Q m seen by arriving client  A m (n+1) = L m (n)  Iterative algorithm to compute mean queue lengths, sojourn times client n n+1 1 A 2 (n+1)=A M (n+1)=A 1 (n+1)=L 1 (n)L 2 (n)L M (n)

18 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 18 Parameter Estimation  Visit ratios  Equivalent to trans. probs. for MVA  V i ≈ λ i / λ req ; λ req at sentry, λ i from logs  Service times  Use residence time X i logged at tier i  For last tier, S M ≈ X M  S i = X i – ( V i+1 / V i ) · X i+1  Think time  Measured at the application sentry

19 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 19 Evaluation of Baseline Model  Auction site RUBiS  One server per tier ApacheJBOSS Mysql  Concurrency limits not captured 75150

20 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 20 Q0Q0 Q1Q1 Q2Q2 QMQM Z Z S1S1 S2S2 SMSM N  Requests may be dropped due to concurrency limits  Need to model the finiteness of queues! Handling Concurrency Limits dropped requests

21 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 21 QMQM p1p1 pMpM S1S1 SMSM Q0Q0 Q1Q1 Q2Q2 QMQM Z Z S1S1 S2S2 SMSM N  Approach: Subsystems to capture dropped requests  Distinguish the processing of dropped requests Handling Concurrency Limits drop Q1Q1

22 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 22 Estimating Drop Probabilities and Delay Values  Drop probability  Step 1: Estimate throughput using MVA assuming no concurrency limits  Step 2: Estimate p i drop as the drop probability of M/M/1/K i queue  Delay value for tier i  Subject the application to offline workload that causes limit to be exceeded only at tier i; record response time of failed requests High limit High limit Low limit Tput=t tt*(1-p i drop ) t*p i drop KiKi

23 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 23  Enhanced model can capture concurrency limits Response Time Prediction

24 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 24 Replication and Load Imbalances  Causes of imbalance  “Sticky” sessions  Variation in session durations and resource requirements  Imbalance factor for j th most-loaded replica of tier i  imbalance(i, j) = num_arrivals(i, j) / num_arrivals(i)  Scale visit ratio  V i, j = V i * imbalance(i, j) ApacheMysql JBOSS

25 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 25 Capturing Load Imbalance Number of requests (per-replica) 0 200 400 600 800 1000 3090150210 270 Time (sec) Number of requests Replica 1 Replica 2 Replica 3 Response times (based on load) 0 200 400 600 800 1000 1200 1400 1600 1800 ObservedPerfect Load balancing Enhanced Model Avg. resp. time (msec) Least loaded Medium loaded Most loaded Average  Session affinity causes load imbalance  Imbalance shifts among replicas  Our enhancement helps improve response time prediction JBOSS Apache Mysql JBOSS

26 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 26 Talk Outline Motivation Thesis contributions Application modeling Dynamic provisioning  Scalable request policing  Conclusions

27 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 27 Dynamic Provisioning  Key idea: increase or decrease allocated servers to handle workload fluctuations  Monitor incoming workload  Compute current or future demand  Match number of allocated servers to demand Monitor workload Monitor workload Compute current/ future demand Compute current/ future demand Adjust allocation Auto. Computing’05

28 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 28 Dynamic Provisioning at Multiple Time-scales  Predictive provisioning  Certain Internet workloads patterns can be predicted o E.g., time-of-day effects, increased workload during Thanksgiving  Provision using model at time-scale of hours or days  Reactive provisioning  Applications may see unpredictable fluctuations o E.g., Increased workload to news-sites after an earthquake  Detect such anomalies and react fast (minutes)

29 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 29 Request Policing  Key Idea: If incoming req. rate > current capacity  Turn away excess requests  Why police when you can provision?  Provisioning is not instantaneous o Residual sessions on reallocated server o Application and OS installation and configuration overheads  Overhead of several (5-30) minutes Sentry policing drop

30 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 30 Existing Work  Lots of existing work on request policing  [Kanodia00, Li00, Verma03, Welsh03, Abdelzaher99, …]  Shortcomings of existing work:  Does not attempt to integrate policing and provisioning  Does not address scalability of the policer! o The policer itself may become the bottleneck during overloads

31 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 31 Policer: Design Goals  Each class should sustain its guaranteed admission rate  Class-based differentiation and revenue maximization  Challenging due to online nature of the problem o An admitted request may cause a more important request arriving later to be dropped  Approach: Preferential admission to higher class requests  Scalability  The policer should remain operational even under extremely high arrival rates

32 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 32 Overview of Policer Design  Our policer has three components  Request classifier and per-class leaky buckets  Class-specific queues  Admission control Classifier Leaky buckets Class gold Class silver Class bronze Class-specific queues Admission control d gold d silver d bronze dropped admitted PODC’04 / WWW’05

33 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 33 Class-based Differentiation Classifier Leaky buckets Class gold Class silver Class bronze Class-specific queues d gold d silver d bronze  Each incoming request undergoes classification  Per-class leaky buckets used to ensure that rates guaranteed in SLA are admitted Admission control dropped admitted

34 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 34 Revenue Maximization Classifier Leaky buckets Class gold Class silver Class bronze Class-specific queues d gold d silver d bronze  Idea: Different delays in processing requests of different classes  More important requests processed more frequently  Methodology to compute delay values in online manner  Bounds probability of a request denying admission to a more important request [Appendix B of thesis] Admission control dropped admitted

35 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 35 Admission Control Classifier Leaky buckets Class gold Class silver Class bronze Class-specific queues d gold d silver d bronze Admission control  Goal: Ensure that an admitted request meets its response time target  Measurement-based admission control algorithm  Use information about current load on servers and estimated size of new request to make decision dropped admitted

36 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 36 Scalability of Admission Control  Idea #1: Reduce the per-request admission control cost  Admission control on every request may be expensive  Bursty arrivals during overloads => batches get formed  Delays for class-based differentiation => batches get formed Admission control that operates on batches instead of requests  Idea #2: Sacrifice accuracy for computational overhead  When batch-based processing becomes prohibitive  Threshold-based scheme o E.g., Admit all Gold requests, drop all Silver and Bronze requests o Thresholds chosen based on observed arrival rates and service times  Extremely efficient  Wrong threshold => bad response times or fewer requests admitted

37 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 37 Scaling Even Further …  Protocol processing overheads will saturate sentry resources at extremely high arrival rates  Indiscriminate dropping of requests will occur o Important requests may be turned away without even undergoing the admission control test o Loss in revenue!  Sentry should still be able to process each arriving request!  Idea: Dynamic capacity provisioning for sentry  Pull in an additional sentry if CPU utilization of existing sentries exceeds a threshold (e.g., 90%)  Round-robin DNS to load balance among sentries

38 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 38 Class-based Differentiation  Three classes of requests: Gold, Silver, Bronze  Policer successful in providing preferential admission to important requests

39 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 39 Threshold-based: Higher Scalability  Threshold-based processing allows the policer to handle upto 4 times higher arrival rate  Single sentry can handle about 19000 req/s

40 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 40 Threshold-based: Loss of Accuracy  Higher scalability comes at a loss in accuracy of admission control  More violations of response time targets

41 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 41 Talk Outline Motivation Thesis contributions Application modeling Dynamic provisioning Scalable request policing Summary and Future Research

42 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 42 Thesis Contributions Dynamic resource management in hosting platforms Shared Hosting  Statistical multiplexing and under-provisioning [OSDI 2002]  Application placement [PDCS 2004] Dedicated Hosting  Analytical model for Internet applications [SIGMETRICS 2005]  Dynamic provisioning [Autonomic Computing 2005]  Scalable request policing [PODC 2004, WWW 2005]

43 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 43 Future Research Directions  Virtual machine based hosting  Recent research has shown feasibility of migrating VMs across nodes  Adds a new dimension to the capacity provisioning problem  Characterizing multi-tier workloads  Workloads for standalone Web servers are well-characterized  E.g., typical service times at Java tier or query processing times?  Offshoot of this study: workloads generators for multi-tier applications  Automated determination of provisioning parameters  Predictor and reactor invoked based on manually chosen frequencies  System administrators use rules-of-thumb => error-prone

44 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 44 Thanks to …  Advisor Prashant Shenoy  Thesis committee Emery Berger, Jim Kurose, Don Towsley, Tilman Wolf  Collaborators Abhishek Chandra, Pawan Goyal, Giovanni Pacifici, Timothy Roscoe, Arnold Rosenberg, Mike Spreitzer, Asser Tantawi  All my teachers Paul Cohen, Mani Krishna, Don Towsley  Friends and family

45 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 45 Questions or comments?

46 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 46 Query Caching at the Database  Caching effects  Captured by tuning V i and/or S i  Bulletin-board site RUBBoS  50 sessions  SELECT SQL_NO_CACHE causes Mysql to not cache the response to a query

47 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 47 Agile Switching Using Virtual Machine Monitors  Use VMMs to enable fast switching of servers  Switching time only limited by residual sessions VMM active dormant active VM 1 VM 2 VM 3 VM 2 VM 3  VMMs allow multiple “virtual” m/c on a server  E.g., Xen, VMWare, …

48 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 48 Prototype Data Center  40+ Linux servers  Gigabit switches  Multi-tier applications  Auction (RUBiS)  Bulletin-board (RUBBoS)  Apache, JBOSS (replicable)  Mysql database Control Plane Application placement Dynamic provisioning Nucleus Apps OS Server Node Application capsules Sentries Resource monitoring Parameter estimation Nucleus Apps OS Nucleus Apps OS

49 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 49 Sentry Provisioning (XXX)

50 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 50 System Overview  Control Plane  Centralized resource manager  Nucleus  Per-server measurements and resource management  Sentry  Per-application admission control  Capsule  Component of an application running on a server Control Plane Nucleus Apps OS Server Node Application capsules Sentries Resource monitoring Parameter estimation Nucleus Apps OS Nucleus Apps OS Application placement Dynamic provisioning

51 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 51 Existing Application Models  Models for Web servers [Chandra03, Doyle03]  Do not model Java server, database etc.  Black-box models [Kamra04, Ranjan02]  Unaware of bottleneck tier  Extensions of single-tier models [Welsh03]  Fail to capture interactions between tiers  Existing models inadequate for multi-tier Internet applications

52 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 52 Existing Work  Predictable resource management within a single server  Proportional-share schedulers for CPU, network [Duda,Goyal,Waldspurger] o Multi-processors [Chandra]  Memory management [Berger,Waldspurger]  Disk scheduling [Shenoy]  Hosting platforms and Internet applications  Rice, Duke, Penn State: shared platforms for Web servers  IBM, HP Labs: shared platforms, workload prediction  Berkeley: novel architecture for Internet applications  Main shortcomings  Possible statistical multiplexing gains in shared platforms unexplored  Most work assumes simplistic applications (e.g., only Web servers)  Provisioning either purely reactive or purely predictive  Handling of extreme overloads not addressed satisfactorily

53 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 53 Predictive Provisioning Allocator Predictors Monitor Application Models Predicted workload Observed workload Resource reqmts Servers Workload measurements Server allocations

54 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 54 Reactive Provisioning  Idea: react to current conditions  Useful for capturing significant short-term fluctuations  Can correct errors in predictions  Track error between long-term predictions and actual  Allocate additional servers if error exceeds a threshold  Can be invoked if request drop rate exceeds a threshold  Operates over time scale of a few minutes  Pure reactive provisioning: lags workload  Reactive + predictive more effective! Prediction error pred actual error >  Invoke reactor time series allocate servers

55 U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science 55 Dynamic Capacity Provisioning WorkloadResponse time Server allocations  Auction application RUBiS  Factor of 4 increase in 30 min  Server allocations increased to match increased workload  Response time kept below 2 seconds


Download ppt "U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Resource Management in Internet Hosting Platforms Ph.D. Thesis Defense."

Similar presentations


Ads by Google