University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advanced Piloting Cruise Plot.
Libra: An Economy driven Job Scheduling System for Clusters Jahanzeb Sherwani 1, Nosheen Ali 1, Nausheen Lotia 1, Zahra Hayat 1, Rajkumar Buyya 2 1. Lahore.
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
11 Application of CSF4 in Avian Flu Grid: Meta-scheduler CSF4. Lab of Grid Computing and Network Security Jilin University, Changchun, China Hongliang.
Dynamic Resource Management for Virtualization HPC Environments Xiaohui Wei College of Computer Science and Technology Jilin University, China. 1 PRAGMA.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Chapter 3: Top-Down Design with Functions Problem Solving & Program Design in C Sixth Edition By Jeri R. Hanly & Elliot B. Koffman.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
FACTORING Think Distributive property backwards Work down, Show all steps ax + ay = a(x + y)
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
Cloud Resource Broker for Scientific Community By: Shahzad Nizamani Supervisor: Peter Dew Co Supervisor: Karim Djemame Mo Haji.
ZMQS ZMQS
VARUN GUPTA Carnegie Mellon University 1 Partly based on joint work with: Anshul Gandhi Mor Harchol-Balter Mike Kozuch (CMU) (CMU) (Intel Research)
Micro Focus Research 1 As far as youre aware, how does your organization plan to drive business growth over the next three years? (Respondents' first choices)
Auto-scaling Axis2 Web Services on Amazon EC2 By Afkham Azeez.
The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.
ABC Technology Project
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
VOORBLAD.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Sample Service Screenshots Enterprise Cloud Service 11.3.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.
Dan Bassett, Jonathan Canfield December 13, 2011.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Januar MDMDFSSMDMDFSSS
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
A SMALL TRUTH TO MAKE LIFE 100%
PSSA Preparation.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
How Cells Obtain Energy from Food
By Rasmussen College. 1. What majors or programs do you offer? 2. What is the average length of your programs? 3. What percentage of your students graduate?
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
SLA-Oriented Resource Provisioning for Cloud Computing
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Introduction to Hadoop and HDFS
Challenges towards Elastic Power Management in Internet Data Center.
Presentation transcript:

University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra Department of Computer Science, University of Minnesota † IBM Almaden Research Center

University of Minnesota MapReduce Provisioning Problem Platform: Virtualized Cloud Environment, which enables Virtualized MapReduce Clusters Several MapReduce Jobs from different users Goal: Optimize system-wide metrics, such as: throughput, energy, load distribution, user costs Problem: At the Cloud Service Provider level, how can we harvest opportunities to increase performance, save energy, or reduce user costs? 2

University of Minnesota MapReduce Platform: Hadoop Open-source implementation of MapReduce distributed computing framework Used widely: Yahoo, Facebook, NYT, (Google) Input Data

University of Minnesota Hadoop Clusters 4 Distributed data Replicated chunks Distributed computation Map/reduce tasks Traditional: Dedicated physical nodes

University of Minnesota Virtual Hadoop Clusters 5 Run Hadoop on top of VMs E.g.: Amazon Elastic MapReduce = Hadoop+AmazonEC2 Server Pool VM Pool Hadoop Processes

University of Minnesota Roadmap Intro & Problem Platform Overview  Spatio-Temporal Insights for Provisioning  Building Blocks for MapReduce Provisioning  Case Study: Performance optimization  Case Study: Energy optimization 6

University of Minnesota Spatio-Temporal Insights for Provisioning Initial Focus: Energy Savings Goal: Minimize energy usage Energy+cooling ~ 42% of total cost [Hamilton08] Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU) 7

University of Minnesota VM Placement: Spatial Fit 8 Job 1Job 2Job 3Job 4 Co-Place complementary workloads

University of Minnesota Which placement is better? 9 20min 10min 100min20min SHUTDOWN AB

University of Minnesota Time Balancing Time Balance

University of Minnesota Building Blocks for Provisioning 11 Objective-driven resource provisioning MapReduce Jobs Job profiling Cluster scaling Migration Cloud Execution Environment Initial Provisioning Continuous Optimization

University of Minnesota Building Blocks for Provisioning Job Profiling: MapReduce job runtime estimation Based on number of VMs allocated to job Based on input data size Offline and Online Profiling Cluster Scaling: Changing number of VMs allocated to a particular MapReduce job Affects runtime of job; relies on Job Profiling model Migration: Useful for continuous optimization Load balancing, VM consolidation 12

University of Minnesota Job Profiling: Runtime Estimation Based on Number of VMs 13

University of Minnesota Job Profiling: Runtime Estimation Based on Input Data Size 14

University of Minnesota Job Profiling: Runtime Estimation Online Profiling: Additional refinement 15

University of Minnesota Cluster Scaling Increasing allocated resources (typical): Add additional VMs to join virtualized Hadoop cluster Job performance increases, runtime decreases E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines: Performance 16

University of Minnesota Cluster Scaling: Time Balancing Time Balance

University of Minnesota Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for Provisioning Building Blocks for MapReduce Provisioning  Case Study: Performance optimization  Case Study: Energy optimization 18

University of Minnesota Case Study: Performance & Deadlines Goal: Meet deadlines for MapReduce jobs Determine initial allocation accurately Dynamically adjust allocation to meet deadline if necessary Monitoring: Use offline profiling to estimate number of VMs needed based on past performance Actuation: Online profiling: Trigger points to invoke cluster scaling 19

University of Minnesota Case Study: Energy Savings Goal: Minimize energy consumption from the execution of a large batch of MapReduce jobs Energy+cooling ~ 42% of total cost [Hamilton08] Pass energy savings on to users Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU) 20

University of Minnesota Case Study: Energy Savings Use Job Profiling to place similar-runtime VMs together for initial provisioning Use Job Profiling to adjust number of VMs in each cluster to adjust runtimes if needed Monitoring: Online profiling to determine when energy could be saved by using migration or cluster scaling Actuation: Use Cluster Scaling or Migration to dynamically adjust for inaccuracies/unknowns in initial provisioning 21

University of Minnesota Conclusion Framework: Building blocks (STEAMEngine) for the optimization of MapReduce provisioning from a cloud service provider perspective Preliminary evaluations to validate usefulness of each building block Approaches for applying building blocks to meet specific goals, e.g. performance, energy 22

University of Minnesota Thank you! Questions? 23

University of Minnesota Job Profiling: Runtime Estimation Based on Number of VMs 24

University of Minnesota Cluster Scaling Increasing allocated resources (typical): Add additional VMs to join virtualized Hadoop cluster Job performance increases, runtime decreases E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines: Performance 25