Download presentation
Presentation is loading. Please wait.
Published byStephen Engledow Modified over 10 years ago
1
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra http://www.cs.umn.edu/~cardosa Department of Computer Science, University of Minnesota † IBM Almaden Research Center
2
University of Minnesota MapReduce Provisioning Problem Platform: Virtualized Cloud Environment, which enables Virtualized MapReduce Clusters Several MapReduce Jobs from different users Goal: Optimize system-wide metrics, such as: throughput, energy, load distribution, user costs Problem: At the Cloud Service Provider level, how can we harvest opportunities to increase performance, save energy, or reduce user costs? 2
3
University of Minnesota MapReduce Platform: Hadoop Open-source implementation of MapReduce distributed computing framework Used widely: Yahoo, Facebook, NYT, (Google) Input Data
4
University of Minnesota Hadoop Clusters 4 Distributed data Replicated chunks Distributed computation Map/reduce tasks Traditional: Dedicated physical nodes
5
University of Minnesota Virtual Hadoop Clusters 5 Run Hadoop on top of VMs E.g.: Amazon Elastic MapReduce = Hadoop+AmazonEC2 Server Pool VM Pool Hadoop Processes
6
University of Minnesota Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for Provisioning Building Blocks for MapReduce Provisioning Case Study: Performance optimization Case Study: Energy optimization 6
7
University of Minnesota Spatio-Temporal Insights for Provisioning Initial Focus: Energy Savings Goal: Minimize energy usage Energy+cooling ~ 42% of total cost [Hamilton08] Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU) 7
8
University of Minnesota VM Placement: Spatial Fit 8 Job 1Job 2Job 3Job 4 Co-Place complementary workloads
9
University of Minnesota Which placement is better? 9 20min 10min 100min20min SHUTDOWN AB
10
University of Minnesota Time Balancing 10 20 25 90 20 25 20 25 20 25 30 20 25 30 20 25 30 Time Balance
11
University of Minnesota Building Blocks for Provisioning 11 Objective-driven resource provisioning MapReduce Jobs Job profiling Cluster scaling Migration Cloud Execution Environment Initial Provisioning Continuous Optimization
12
University of Minnesota Building Blocks for Provisioning Job Profiling: MapReduce job runtime estimation Based on number of VMs allocated to job Based on input data size Offline and Online Profiling Cluster Scaling: Changing number of VMs allocated to a particular MapReduce job Affects runtime of job; relies on Job Profiling model Migration: Useful for continuous optimization Load balancing, VM consolidation 12
13
University of Minnesota Job Profiling: Runtime Estimation Based on Number of VMs 13
14
University of Minnesota Job Profiling: Runtime Estimation Based on Input Data Size 14
15
University of Minnesota Job Profiling: Runtime Estimation Online Profiling: Additional refinement 15
16
University of Minnesota Cluster Scaling Increasing allocated resources (typical): Add additional VMs to join virtualized Hadoop cluster Job performance increases, runtime decreases E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines: Performance 16
17
University of Minnesota Cluster Scaling: Time Balancing 17 20 25 90 20 25 20 25 20 25 30 20 25 30 20 25 30 Time Balance
18
University of Minnesota Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for Provisioning Building Blocks for MapReduce Provisioning Case Study: Performance optimization Case Study: Energy optimization 18
19
University of Minnesota Case Study: Performance & Deadlines Goal: Meet deadlines for MapReduce jobs Determine initial allocation accurately Dynamically adjust allocation to meet deadline if necessary Monitoring: Use offline profiling to estimate number of VMs needed based on past performance Actuation: Online profiling: Trigger points to invoke cluster scaling 19
20
University of Minnesota Case Study: Energy Savings Goal: Minimize energy consumption from the execution of a large batch of MapReduce jobs Energy+cooling ~ 42% of total cost [Hamilton08] Pass energy savings on to users Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU) 20
21
University of Minnesota Case Study: Energy Savings Use Job Profiling to place similar-runtime VMs together for initial provisioning Use Job Profiling to adjust number of VMs in each cluster to adjust runtimes if needed Monitoring: Online profiling to determine when energy could be saved by using migration or cluster scaling Actuation: Use Cluster Scaling or Migration to dynamically adjust for inaccuracies/unknowns in initial provisioning 21
22
University of Minnesota Conclusion Framework: Building blocks (STEAMEngine) for the optimization of MapReduce provisioning from a cloud service provider perspective Preliminary evaluations to validate usefulness of each building block Approaches for applying building blocks to meet specific goals, e.g. performance, energy 22
23
University of Minnesota Thank you! Questions? 23
24
University of Minnesota Job Profiling: Runtime Estimation Based on Number of VMs 24
25
University of Minnesota Cluster Scaling Increasing allocated resources (typical): Add additional VMs to join virtualized Hadoop cluster Job performance increases, runtime decreases E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines: Performance 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.