1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.

1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard Labs., Palo Alto, CA, USA ITA05, Wrexham, UK September 2005

2 Why Dynamic Resource Allocation High demand variation for an Internet service High demand variation for an Internet service Daily: peak load ~10 times average load during day Daily: peak load ~10 times average load during day Variation over longer time scales (days, weeks) Variation over longer time scales (days, weeks) Benefits of Dynamic Resource Allocation Benefits of Dynamic Resource Allocation Reduce operating costs for a service Reduce operating costs for a service Energy Energy Software license fees Software license fees Support more services on a shared infrastructure Support more services on a shared infrastructure Shift resources between services on-demand Shift resources between services on-demand Practical: fast server re-purposing Practical: fast server re-purposing Blade server management Blade server management Networked storage Networked storage Virtual machine cloning/migration Virtual machine cloning/migration

3 Problem Determine resource requirements for a service on-the-fly Determine resource requirements for a service on-the-fly Challenges: Challenges: Frequent service updates Frequent service updates Frequent changes in client interest set Frequent changes in client interest set  Static a priori capacity planning won’t work

4 Approach: Hidra Hidra: History-based Dynamic Resource Allocation “Black-box approach”: continuously build and update a model of system behavior from externally visible performance attributes, without knowledge of internal operation (e.g., what is the bottleneck resource) “Black-box approach”: continuously build and update a model of system behavior from externally visible performance attributes, without knowledge of internal operation (e.g., what is the bottleneck resource) Model updates: introduce freshness and confidence Model updates: introduce freshness and confidence Extrapolation: determine resource requirements with only a partial model Extrapolation: determine resource requirements with only a partial model

5 Scope Large services requiring multiple servers Large services requiring multiple servers Multi-tier: each tier = a cluster of servers. Assumptions: Multi-tier: each tier = a cluster of servers. Assumptions: Identical servers within a tier Identical servers within a tier Servers in different tiers can be different Servers in different tiers can be different Allocation granularity = Server (ex: blade in a blade server) Allocation granularity = Server (ex: blade in a blade server) Predictable client request rate Predictable client request rate Reasonable if smoothly varying, or occasional discontinuities Reasonable if smoothly varying, or occasional discontinuities Service and server behavior can change over time Service and server behavior can change over time Goal: Find minimum cost resource allocation that meets server response time requirement Goal: Find minimum cost resource allocation that meets server response time requirement Cost = sum of cost of servers allocated to each tier Cost = sum of cost of servers allocated to each tier Mean response time (may be generalized) Mean response time (may be generalized)

6 Outline Single-tier history-based resource allocation Single-tier history-based resource allocation Constructing and updating history-based model (freshness and confidence) Constructing and updating history-based model (freshness and confidence) Using the model to determine resource allocation (extrapolation) Using the model to determine resource allocation (extrapolation) Multi-tier history-based resource allocation Multi-tier history-based resource allocation Summary Summary

7 Single-Tier History-Based Model Model represents the average behavior of a server in a tier Model represents the average behavior of a server in a tier Consists of a collection of measured operating points (history) for the tier Consists of a collection of measured operating points (history) for the tier Each history point: at least (request rate per server, mean response time) Each history point: at least (request rate per server, mean response time) Model provides an estimate of function F (): Model provides an estimate of function F (): response time = F (request rate) (increasing function in range of interest) (per-server request rate)

8 Using the History-Based Model Goal: find the fewest servers needed to meet a requirement for maximum mean response time Goal: find the fewest servers needed to meet a requirement for maximum mean response time Extrapolate model to find the largest feasible average request rate per server Extrapolate model to find the largest feasible average request rate per server Given R = tier’s applied load (requests per second) Given R = tier’s applied load (requests per second)  Resource allocation = N = R/  servers Response time threshold (per-server request rate)

9 Updating the Model Response time function can change over time: Response time function can change over time: Service content or implementation Service content or implementation Client interest set Client interest set Number of allocated servers (request distribution, and non-linear performance scaling) Number of allocated servers (request distribution, and non-linear performance scaling) Nevertheless, history-based model is useful Nevertheless, history-based model is useful Gradual changes  recent history is a good approximation Gradual changes  recent history is a good approximation Occasional large changes  recent history is relevant except in immediate moments after a large change Occasional large changes  recent history is relevant except in immediate moments after a large change Periodically update model based on current performance measurements Periodically update model based on current performance measurements Balance responsiveness and accuracy: Incorporate new measurements quickly to model current behavior, but not so aggressively that transient glitches pollute the model Balance responsiveness and accuracy: Incorporate new measurements quickly to model current behavior, but not so aggressively that transient glitches pollute the model

10 History Update: Freshness and Confidence History point update as weighted average of stored value and new measurement History point update as weighted average of stored value and new measurement New stored value =  * old stored value + (1 –  ) * new measurement Older history is less likely to represent current behavior Older history is less likely to represent current behavior Recent history can be obsolete after a sudden shift in behavior Recent history can be obsolete after a sudden shift in behavior Weighting factor  combines: Weighting factor  combines: Freshness: value which decreases with time since last update Freshness: value which decreases with time since last update Confidence: value which increases with repeated confirmation of consistent behavior for the history point Confidence: value which increases with repeated confirmation of consistent behavior for the history point Combination  EWMA (captures freshness) with decay rate that slows with increasing confidence Combination  EWMA (captures freshness) with decay rate that slows with increasing confidence

11 Extrapolation: Determining Resource Allocation Model has incomplete view of response time function Model has incomplete view of response time function To find optimal, Hidra extrapolates/interpolates unique pair of history points To find optimal, Hidra extrapolates/interpolates unique pair of history points Only use points that match general shape of typical response time curve (positive slope) Only use points that match general shape of typical response time curve (positive slope) Favor points with high  value (ignore if  is very small) Favor points with high  value (ignore if  is very small) If only one point exists (current operating point), adjust allocation differently If only one point exists (current operating point), adjust allocation differently Limits on consecutive changes in resource allocation (fixed limit for decreases, growing limits for increases) Limits on consecutive changes in resource allocation (fixed limit for decreases, growing limits for increases) Threshold Applied Load Response Time 1 2 3 4 5 6 7 8 9 XYZ

12 Single-Tier Evaluation: Overview Approach: Apply Hidra to allocate resources for a simulated cluster Approach: Apply Hidra to allocate resources for a simulated cluster Simulation allows easy control of cluster behavior and determination of optimal allocation Simulation allows easy control of cluster behavior and determination of optimal allocation Each server modeled as simple M/M/1 queue with time-varying arrival rate and service rate  Each server modeled as simple M/M/1 queue with time-varying arrival rate and service rate  Provides response time function that varies over time Provides response time function that varies over time More complex models not needed for our purposes More complex models not needed for our purposes Effectiveness of freshness and confidence Effectiveness of freshness and confidence Effectiveness for clusters with non-linear cluster performance scaling Effectiveness for clusters with non-linear cluster performance scaling

13 Effectiveness of Freshness Increase  steadily over time from 40 to 70 req/s Increase  steadily over time from 40 to 70 req/s No freshness (red) uses obsolete information No freshness (red) uses obsolete information Freshness (green) close to optimal (blue) allocation Freshness (green) close to optimal (blue) allocation

14 Effectiveness of Confidence Set  constant over time except for periodic transients Set  constant over time except for periodic transients Freshness only, no ConfidenceFreshness and Confidence Using Confidence, Hidra less susceptible to short-term transients by preserving more commonly observed values Using Confidence, Hidra less susceptible to short-term transients by preserving more commonly observed values

15 Non-Linear Cluster Scaling Response time function may be sensitive to the resource allocation. Examples: Response time function may be sensitive to the resource allocation. Examples: Caching effect: Memory in each additional server adds to total effective content cache capacity if shared effectively  throughput scales faster than N Caching effect: Memory in each additional server adds to total effective content cache capacity if shared effectively  throughput scales faster than N Communication effect: Overhead of coordination between servers  throughput scales slower than N Communication effect: Overhead of coordination between servers  throughput scales slower than N Evaluate using request rates from hp.com logs for a 24-hour period Evaluate using request rates from hp.com logs for a 24-hour period Caching: assume hit ratio increases linearly with N, causing increase of service rate  Caching: assume hit ratio increases linearly with N, causing increase of service rate  Communication: increase service time (1/  ) linearly with N Communication: increase service time (1/  ) linearly with N

16 Caching Effect Results Service Rate  Resource Allocation Response Time Wide variation in the average behavior of a server Wide variation in the average behavior of a server Each server is more effective as allocation is increased Each server is more effective as allocation is increased Hidra adapts, achieving close to optimal allocation Hidra adapts, achieving close to optimal allocation

17 Communication Effect Results Service Rate  Resource Allocation Response Time Opposite service rate behavior compared to caching Opposite service rate behavior compared to caching Each server is less effective as allocation is increased Each server is less effective as allocation is increased Hidra handles this case also Hidra handles this case also

18 Multi-Tier Resource Allocation Multi-Tier characteristics Multi-Tier characteristics A request to first tier could trigger multiple secondary requests to other tiers A request to first tier could trigger multiple secondary requests to other tiers Average response time is sum of average response times of each tier Average response time is sum of average response times of each tier Cost of resource could be different for different tiers Cost of resource could be different for different tiers Multi-Tier resource allocation as an extension of the single-tier case Multi-Tier resource allocation as an extension of the single-tier case Response time for each tier computed using single-tier algorithm Response time for each tier computed using single-tier algorithm Dynamically vary target response times for each tier to minimize total cost resource allocation Dynamically vary target response times for each tier to minimize total cost resource allocation Same client request rate used for all tiers Same client request rate used for all tiers

19 Two-Tier Results Caching (both tiers) Communication (both tiers) Caching (Tier1) Caching (both tiers) Communication (both tiers) Caching (Tier1) Communication (Tier 2) Communication (Tier 2) Total cost of allocated servers Same effect in both tiers  results similar to single- tier case are optimal Same effect in both tiers  results similar to single- tier case are optimal Different effects in each tier  optimal allocation has cost intermediate between the two extremes Different effects in each tier  optimal allocation has cost intermediate between the two extremes Hidra adapts successfully to all these cases Hidra adapts successfully to all these cases

20 Summary Presented Hidra for history-based resource allocation of server clusters Presented Hidra for history-based resource allocation of server clusters Proposed use of freshness and confidence to update history-based model effectively Proposed use of freshness and confidence to update history-based model effectively Developed extrapolation approach for finding operating point with incomplete model Developed extrapolation approach for finding operating point with incomplete model Extended the model to multi-tier systems Extended the model to multi-tier systems Simulation-based results show scheme is promising for both single-tier and multi-tier systems Simulation-based results show scheme is promising for both single-tier and multi-tier systems

1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.

Similar presentations

Presentation on theme: "1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.

Similar presentations

Presentation on theme: "1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard."— Presentation transcript:

Similar presentations

About project

Feedback