Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra

Similar presentations


Presentation on theme: "University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra"— Presentation transcript:

1 University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra http://www.cs.umn.edu/~cardosa Department of Computer Science, University of Minnesota † IBM Almaden Research Center

2 University of Minnesota MapReduce Provisioning Problem Platform: Virtualized Cloud Environment, which enables Virtualized MapReduce Clusters Several MapReduce Jobs from different users Goal: Optimize system-wide metrics, such as: throughput, energy, load distribution, user costs Problem: At the Cloud Service Provider level, how can we harvest opportunities to increase performance, save energy, or reduce user costs? 2

3 University of Minnesota MapReduce Platform: Hadoop Open-source implementation of MapReduce distributed computing framework Used widely: Yahoo, Facebook, NYT, (Google) Input Data

4 University of Minnesota Hadoop Clusters 4 Distributed data Replicated chunks Distributed computation Map/reduce tasks Traditional: Dedicated physical nodes

5 University of Minnesota Virtual Hadoop Clusters 5 Run Hadoop on top of VMs E.g.: Amazon Elastic MapReduce = Hadoop+AmazonEC2 Server Pool VM Pool Hadoop Processes

6 University of Minnesota Roadmap Intro & Problem Platform Overview  Spatio-Temporal Insights for Provisioning  Building Blocks for MapReduce Provisioning  Case Study: Performance optimization  Case Study: Energy optimization 6

7 University of Minnesota Spatio-Temporal Insights for Provisioning Initial Focus: Energy Savings Goal: Minimize energy usage Energy+cooling ~ 42% of total cost [Hamilton08] Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU) 7

8 University of Minnesota VM Placement: Spatial Fit 8 Job 1Job 2Job 3Job 4 Co-Place complementary workloads

9 University of Minnesota Which placement is better? 9 20min 10min 100min20min SHUTDOWN AB

10 University of Minnesota Time Balancing 10 20 25 90 20 25 20 25 20 25 30 20 25 30 20 25 30 Time Balance

11 University of Minnesota Building Blocks for Provisioning 11 Objective-driven resource provisioning MapReduce Jobs Job profiling Cluster scaling Migration Cloud Execution Environment Initial Provisioning Continuous Optimization

12 University of Minnesota Building Blocks for Provisioning Job Profiling: MapReduce job runtime estimation Based on number of VMs allocated to job Based on input data size Offline and Online Profiling Cluster Scaling: Changing number of VMs allocated to a particular MapReduce job Affects runtime of job; relies on Job Profiling model Migration: Useful for continuous optimization Load balancing, VM consolidation 12

13 University of Minnesota Job Profiling: Runtime Estimation Based on Number of VMs 13

14 University of Minnesota Job Profiling: Runtime Estimation Based on Input Data Size 14

15 University of Minnesota Job Profiling: Runtime Estimation Online Profiling: Additional refinement 15

16 University of Minnesota Cluster Scaling Increasing allocated resources (typical): Add additional VMs to join virtualized Hadoop cluster Job performance increases, runtime decreases E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines: Performance 16

17 University of Minnesota Cluster Scaling: Time Balancing 17 20 25 90 20 25 20 25 20 25 30 20 25 30 20 25 30 Time Balance

18 University of Minnesota Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for Provisioning Building Blocks for MapReduce Provisioning  Case Study: Performance optimization  Case Study: Energy optimization 18

19 University of Minnesota Case Study: Performance & Deadlines Goal: Meet deadlines for MapReduce jobs Determine initial allocation accurately Dynamically adjust allocation to meet deadline if necessary Monitoring: Use offline profiling to estimate number of VMs needed based on past performance Actuation: Online profiling: Trigger points to invoke cluster scaling 19

20 University of Minnesota Case Study: Energy Savings Goal: Minimize energy consumption from the execution of a large batch of MapReduce jobs Energy+cooling ~ 42% of total cost [Hamilton08] Pass energy savings on to users Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU) 20

21 University of Minnesota Case Study: Energy Savings Use Job Profiling to place similar-runtime VMs together for initial provisioning Use Job Profiling to adjust number of VMs in each cluster to adjust runtimes if needed Monitoring: Online profiling to determine when energy could be saved by using migration or cluster scaling Actuation: Use Cluster Scaling or Migration to dynamically adjust for inaccuracies/unknowns in initial provisioning 21

22 University of Minnesota Conclusion Framework: Building blocks (STEAMEngine) for the optimization of MapReduce provisioning from a cloud service provider perspective Preliminary evaluations to validate usefulness of each building block Approaches for applying building blocks to meet specific goals, e.g. performance, energy 22

23 University of Minnesota Thank you! Questions? 23

24 University of Minnesota Job Profiling: Runtime Estimation Based on Number of VMs 24

25 University of Minnesota Cluster Scaling Increasing allocated resources (typical): Add additional VMs to join virtualized Hadoop cluster Job performance increases, runtime decreases E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines: Performance 25


Download ppt "University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra"

Similar presentations


Ads by Google