1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.

Slides:

Advertisements

Similar presentations

Key Metrics for Effective Storage Performance and Capacity Reporting.

Advertisements

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Hadi Goudarzi and Massoud Pedram

Part II – TIME SERIES ANALYSIS C3 Exponential Smoothing Methods © Angel A. Juan & Carles Serrat - UPC 2007/2008.

HIERARCHY REFERENCING TIME SYNCHRONIZATION PROTOCOL Prepared by : Sunny Kr. Lohani, Roll – 16 Sem – 7, Dept. of Comp. Sc. & Engg.

1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin Network Disks University.

Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, Alex C. Snoeren Defense: Rejaie Johnson, Xian Yi Teng.

G. Alonso, D. Kossmann Systems Group

Load Balancing of Elastic Traffic in Heterogeneous Wireless Networks Abdulfetah Khalid, Samuli Aalto and Pasi Lassila

Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers Vlasia Anagnostopoulou Susmit.

U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Provisioning for Multi-tier Internet Applications Bhuvan Urgaonkar, Prashant.

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

Small-world Overlay P2P Network

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Volcano Routing Scheme Routing in a Highly Dynamic Environment Yashar Ganjali Stanford University Joint work with: Nick McKeown SECON 2005, Santa Clara,

Wade Wegner Windows Azure Technical Evangelist Microsoft Corporation Windows Azure AppFabric Caching.

Building a Controlled Delay Assured Forwarding Class in DiffServ Networks Parag Kulkarni Nazeeruddin Mohammad Sally McClean Gerard Parr Michaela Black.

Load Adaptation: Options for Basic Services Vance Maverick ADAPT Bologna Feb. 13, 2003.

Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.

An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.

Proteus: Power Proportional Memory Cache Cluster in Data Centers Shen Li, Shiguang Wang, Fan Yang, Shaohan Hu, Fatemeh Saremi, Tarek Abdelzaher.

Chapter 9 Classification And Forwarding. Outline.

CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.

Cutting the Electric Bill for Internet-Scale Systems Andreas Andreou Cambridge University, R02

Radial Basis Function Networks

Towards Autonomic Hosting of Multi-tier Internet Services Swaminathan Sivasubramanian, Guillaume Pierre and Maarten van Steen Vrije Universiteit, Amsterdam,

Computer Science Cataclysm: Policing Extreme Overloads in Internet Applications Bhuvan Urgaonkar and Prashant Shenoy University of Massachusetts.

By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.

Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,

Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.

Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.

Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.

Jochen Triesch, UC San Diego, 1 Short-term and Long-term Memory Motivation: very simple circuits can store patterns of.

OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.

On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,

Distributed Anomaly Detection in Wireless Sensor Networks Ksutharshan Rajasegarar, Christopher Leckie, Marimutha Palaniswami, James C. Bezdek IEEE ICCS2006(Institutions.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.

Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

Scalable Multi-Class Traffic Management in Data Center Backbone Networks Amitabha Ghosh (UtopiaCompression) Sangtae Ha (Princeton) Edward Crabbe (Google)

1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.

9 Systems Analysis and Design in a Changing World, Fourth Edition.

Multicache-Based Content Management for Web Caching Kai Cheng and Yahiko Kambayashi Graduate School of Informatics, Kyoto University Kyoto JAPAN.

Design and Evaluation of a Model for Multi-tiered Internet Applications Bhuvan Urgaonkar Internship project talk – Services Management Middleware Dept,

Opportunistic Traffic Scheduling Over Multiple Network Path Coskun Cetinkaya and Edward Knightly.

C-Hint: An Effective and Reliable Cache Management for RDMA- Accelerated Key-Value Stores Yandong Wang, Xiaoqiao Meng, Li Zhang, Jian Tan Presented by:

Authors: Mianyu Wang, Nagarajan Kandasamy, Allon Guez, and Moshe Kam Proceedings of the 3 rd International Conference on Autonomic Computing, ICAC 2006,

Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.

Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.

Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.

Chapter 8 System Management Semester 2. Objectives  Evaluating an operating system  Cooperation among components  The role of memory, processor,

Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.

1 Information Content Tristan L’Ecuyer. 2 Degrees of Freedom Using the expression for the state vector that minimizes the cost function it is relatively.

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

Web Server Load Balancing/Scheduling

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Web Server Load Balancing/Scheduling

The Impact of Replacement Granularity on Video Caching

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

ISP and Egress Path Selection for Multihomed Networks

On the Scale and Performance of Cooperative Web Proxy Caching

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)

Performance-Robust Parallel I/O

Presentation transcript:

1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard Labs., Palo Alto, CA, USA ITA05, Wrexham, UK September 2005

2 Why Dynamic Resource Allocation High demand variation for an Internet service High demand variation for an Internet service Daily: peak load ~10 times average load during day Daily: peak load ~10 times average load during day Variation over longer time scales (days, weeks) Variation over longer time scales (days, weeks) Benefits of Dynamic Resource Allocation Benefits of Dynamic Resource Allocation Reduce operating costs for a service Reduce operating costs for a service Energy Energy Software license fees Software license fees Support more services on a shared infrastructure Support more services on a shared infrastructure Shift resources between services on-demand Shift resources between services on-demand Practical: fast server re-purposing Practical: fast server re-purposing Blade server management Blade server management Networked storage Networked storage Virtual machine cloning/migration Virtual machine cloning/migration

3 Problem Determine resource requirements for a service on-the-fly Determine resource requirements for a service on-the-fly Challenges: Challenges: Frequent service updates Frequent service updates Frequent changes in client interest set Frequent changes in client interest set  Static a priori capacity planning won’t work

4 Approach: Hidra Hidra: History-based Dynamic Resource Allocation “Black-box approach”: continuously build and update a model of system behavior from externally visible performance attributes, without knowledge of internal operation (e.g., what is the bottleneck resource) “Black-box approach”: continuously build and update a model of system behavior from externally visible performance attributes, without knowledge of internal operation (e.g., what is the bottleneck resource) Model updates: introduce freshness and confidence Model updates: introduce freshness and confidence Extrapolation: determine resource requirements with only a partial model Extrapolation: determine resource requirements with only a partial model

5 Scope Large services requiring multiple servers Large services requiring multiple servers Multi-tier: each tier = a cluster of servers. Assumptions: Multi-tier: each tier = a cluster of servers. Assumptions: Identical servers within a tier Identical servers within a tier Servers in different tiers can be different Servers in different tiers can be different Allocation granularity = Server (ex: blade in a blade server) Allocation granularity = Server (ex: blade in a blade server) Predictable client request rate Predictable client request rate Reasonable if smoothly varying, or occasional discontinuities Reasonable if smoothly varying, or occasional discontinuities Service and server behavior can change over time Service and server behavior can change over time Goal: Find minimum cost resource allocation that meets server response time requirement Goal: Find minimum cost resource allocation that meets server response time requirement Cost = sum of cost of servers allocated to each tier Cost = sum of cost of servers allocated to each tier Mean response time (may be generalized) Mean response time (may be generalized)

6 Outline Single-tier history-based resource allocation Single-tier history-based resource allocation Constructing and updating history-based model (freshness and confidence) Constructing and updating history-based model (freshness and confidence) Using the model to determine resource allocation (extrapolation) Using the model to determine resource allocation (extrapolation) Multi-tier history-based resource allocation Multi-tier history-based resource allocation Summary Summary

7 Single-Tier History-Based Model Model represents the average behavior of a server in a tier Model represents the average behavior of a server in a tier Consists of a collection of measured operating points (history) for the tier Consists of a collection of measured operating points (history) for the tier Each history point: at least (request rate per server, mean response time) Each history point: at least (request rate per server, mean response time) Model provides an estimate of function F (): Model provides an estimate of function F (): response time = F (request rate) (increasing function in range of interest) (per-server request rate)

8 Using the History-Based Model Goal: find the fewest servers needed to meet a requirement for maximum mean response time Goal: find the fewest servers needed to meet a requirement for maximum mean response time Extrapolate model to find the largest feasible average request rate per server Extrapolate model to find the largest feasible average request rate per server Given R = tier’s applied load (requests per second) Given R = tier’s applied load (requests per second)  Resource allocation = N = R/  servers Response time threshold (per-server request rate)

9 Updating the Model Response time function can change over time: Response time function can change over time: Service content or implementation Service content or implementation Client interest set Client interest set Number of allocated servers (request distribution, and non-linear performance scaling) Number of allocated servers (request distribution, and non-linear performance scaling) Nevertheless, history-based model is useful Nevertheless, history-based model is useful Gradual changes  recent history is a good approximation Gradual changes  recent history is a good approximation Occasional large changes  recent history is relevant except in immediate moments after a large change Occasional large changes  recent history is relevant except in immediate moments after a large change Periodically update model based on current performance measurements Periodically update model based on current performance measurements Balance responsiveness and accuracy: Incorporate new measurements quickly to model current behavior, but not so aggressively that transient glitches pollute the model Balance responsiveness and accuracy: Incorporate new measurements quickly to model current behavior, but not so aggressively that transient glitches pollute the model

10 History Update: Freshness and Confidence History point update as weighted average of stored value and new measurement History point update as weighted average of stored value and new measurement New stored value =  * old stored value + (1 –  ) * new measurement Older history is less likely to represent current behavior Older history is less likely to represent current behavior Recent history can be obsolete after a sudden shift in behavior Recent history can be obsolete after a sudden shift in behavior Weighting factor  combines: Weighting factor  combines: Freshness: value which decreases with time since last update Freshness: value which decreases with time since last update Confidence: value which increases with repeated confirmation of consistent behavior for the history point Confidence: value which increases with repeated confirmation of consistent behavior for the history point Combination  EWMA (captures freshness) with decay rate that slows with increasing confidence Combination  EWMA (captures freshness) with decay rate that slows with increasing confidence

11 Extrapolation: Determining Resource Allocation Model has incomplete view of response time function Model has incomplete view of response time function To find optimal, Hidra extrapolates/interpolates unique pair of history points To find optimal, Hidra extrapolates/interpolates unique pair of history points Only use points that match general shape of typical response time curve (positive slope) Only use points that match general shape of typical response time curve (positive slope) Favor points with high  value (ignore if  is very small) Favor points with high  value (ignore if  is very small) If only one point exists (current operating point), adjust allocation differently If only one point exists (current operating point), adjust allocation differently Limits on consecutive changes in resource allocation (fixed limit for decreases, growing limits for increases) Limits on consecutive changes in resource allocation (fixed limit for decreases, growing limits for increases) Threshold Applied Load Response Time XYZ

12 Single-Tier Evaluation: Overview Approach: Apply Hidra to allocate resources for a simulated cluster Approach: Apply Hidra to allocate resources for a simulated cluster Simulation allows easy control of cluster behavior and determination of optimal allocation Simulation allows easy control of cluster behavior and determination of optimal allocation Each server modeled as simple M/M/1 queue with time-varying arrival rate and service rate  Each server modeled as simple M/M/1 queue with time-varying arrival rate and service rate  Provides response time function that varies over time Provides response time function that varies over time More complex models not needed for our purposes More complex models not needed for our purposes Effectiveness of freshness and confidence Effectiveness of freshness and confidence Effectiveness for clusters with non-linear cluster performance scaling Effectiveness for clusters with non-linear cluster performance scaling

13 Effectiveness of Freshness Increase  steadily over time from 40 to 70 req/s Increase  steadily over time from 40 to 70 req/s No freshness (red) uses obsolete information No freshness (red) uses obsolete information Freshness (green) close to optimal (blue) allocation Freshness (green) close to optimal (blue) allocation

14 Effectiveness of Confidence Set  constant over time except for periodic transients Set  constant over time except for periodic transients Freshness only, no ConfidenceFreshness and Confidence Using Confidence, Hidra less susceptible to short-term transients by preserving more commonly observed values Using Confidence, Hidra less susceptible to short-term transients by preserving more commonly observed values

15 Non-Linear Cluster Scaling Response time function may be sensitive to the resource allocation. Examples: Response time function may be sensitive to the resource allocation. Examples: Caching effect: Memory in each additional server adds to total effective content cache capacity if shared effectively  throughput scales faster than N Caching effect: Memory in each additional server adds to total effective content cache capacity if shared effectively  throughput scales faster than N Communication effect: Overhead of coordination between servers  throughput scales slower than N Communication effect: Overhead of coordination between servers  throughput scales slower than N Evaluate using request rates from hp.com logs for a 24-hour period Evaluate using request rates from hp.com logs for a 24-hour period Caching: assume hit ratio increases linearly with N, causing increase of service rate  Caching: assume hit ratio increases linearly with N, causing increase of service rate  Communication: increase service time (1/  ) linearly with N Communication: increase service time (1/  ) linearly with N

16 Caching Effect Results Service Rate  Resource Allocation Response Time Wide variation in the average behavior of a server Wide variation in the average behavior of a server Each server is more effective as allocation is increased Each server is more effective as allocation is increased Hidra adapts, achieving close to optimal allocation Hidra adapts, achieving close to optimal allocation

17 Communication Effect Results Service Rate  Resource Allocation Response Time Opposite service rate behavior compared to caching Opposite service rate behavior compared to caching Each server is less effective as allocation is increased Each server is less effective as allocation is increased Hidra handles this case also Hidra handles this case also

18 Multi-Tier Resource Allocation Multi-Tier characteristics Multi-Tier characteristics A request to first tier could trigger multiple secondary requests to other tiers A request to first tier could trigger multiple secondary requests to other tiers Average response time is sum of average response times of each tier Average response time is sum of average response times of each tier Cost of resource could be different for different tiers Cost of resource could be different for different tiers Multi-Tier resource allocation as an extension of the single-tier case Multi-Tier resource allocation as an extension of the single-tier case Response time for each tier computed using single-tier algorithm Response time for each tier computed using single-tier algorithm Dynamically vary target response times for each tier to minimize total cost resource allocation Dynamically vary target response times for each tier to minimize total cost resource allocation Same client request rate used for all tiers Same client request rate used for all tiers

19 Two-Tier Results Caching (both tiers) Communication (both tiers) Caching (Tier1) Caching (both tiers) Communication (both tiers) Caching (Tier1) Communication (Tier 2) Communication (Tier 2) Total cost of allocated servers Same effect in both tiers  results similar to single- tier case are optimal Same effect in both tiers  results similar to single- tier case are optimal Different effects in each tier  optimal allocation has cost intermediate between the two extremes Different effects in each tier  optimal allocation has cost intermediate between the two extremes Hidra adapts successfully to all these cases Hidra adapts successfully to all these cases

20 Summary Presented Hidra for history-based resource allocation of server clusters Presented Hidra for history-based resource allocation of server clusters Proposed use of freshness and confidence to update history-based model effectively Proposed use of freshness and confidence to update history-based model effectively Developed extrapolation approach for finding operating point with incomplete model Developed extrapolation approach for finding operating point with incomplete model Extended the model to multi-tier systems Extended the model to multi-tier systems Simulation-based results show scheme is promising for both single-tier and multi-tier systems Simulation-based results show scheme is promising for both single-tier and multi-tier systems