Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.
Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
SkewTune: Mitigating Skew in MapReduce Applications
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Predicting Execution Bottlenecks in Map-Reduce Clusters Edward Bortnikov, Ari Frank, Eshcar Hillel, Sriram Rao Presenting: Alex Shraer Yahoo! Labs.
Distributed Computations
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Standard architecture emerging: – Cluster of commodity.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Distributed Computations MapReduce
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Distributed MapReduce Team B Presented by: Christian Bryan Matthew Dailey Greg Opperman Nate Piper Brett Ponsler Samuel Song Alex Ostapenko Keilin Bickar.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Hadoop and HDFS
HadoopDB Presenters: Serva rashidyan Somaie shahrokhi Aida parbale Spring 2012 azad university of sanandaj 1.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Mining High Utility Itemset in Big Data
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.
Matchmaking: A New MapReduce Scheduling Technique
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
A Two-phase Execution Engine of Reduce Tasks In Hadoop MapReduce XiaohongZhang*GuoweiWang* ZijingYang*YangDing School of Computer Science and Technology.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Big Data is a Big Deal!.
Large-scale file systems and Map-Reduce
Edinburgh Napier University
PA an Coordinated Memory Caching for Parallel Jobs
Hadoop Clusters Tess Fulkerson.
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
MapReduce Simplied Data Processing on Large Clusters
The Basics of Apache Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
Zhen Xiao, Qi Chen, and Haipeng Luo May 2013
CS 345A Data Mining MapReduce This presentation has been altered.
Cloud Computing MapReduce in Heterogeneous Environments
5/7/2019 Map Reduce Map reduce.
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on Computers

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Introduction  The new era of Big Data is coming!  – 20 PB per day (2008)  – 30 TB per day (2009)  – 60 TB per day (2010)  –petabytes per day  What does big data mean?  Important user information  significant business value

MapReduce  What is MapReduce?  most popular parallel computing model proposed by Google database operation Search engine Machine learning Cryptanalysi s Scientific computation Applications … Select, Join, Group Page rank, Inverted index, Log analysis Clustering, machine translation, Recommendation

Straggler  What is straggler in MapReduce?  Nodes on which tasks take an unusually long time to finish  It will:  Delay the job execution time  Degrade the cluster throughput  How to solve it?  Speculative execution  Slow task is backed up on an alternative machine with the hope that the backup one can finish faster

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Architecture Split 1 Split 2 … Split M Map Part 2 Part 1 Map Part 2 Part 1 Map Part 2 Part 1 Reduc e Output2 Input files Map Stage Reduce Stage Output files Output1 Master … Assig n

Programming model  Input : (key, value) pairs  Output : (key*, value*) pairs Phase Stage Map: MapCombine List(K 1, V 1 ) → List(K 2,V 2 ) → List(K 2, List(V 2 )) Reduce: CopySortReduce List(K 2, List(V 2 )) → Ordered (K 2, List(V 2 )) → List(K 3,V 3 )

Causes of Stragglers Internal factorsExternal factors resource capacity of worker nodes is heterogeneous resource competition due to other MapReduce tasks running on the same worker node resource competition due to co-hosted applications input data skew remote input or output source is too slow hardware faulty

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Previous work  Google and Dryad  When a stage is close to completion  Backup an arbitrary set of the remaining tasks  Hadoop Original  Backup task whose progress falls behind the average by a fixed gap  LATE (OSDI ’ 08)  Backup task: 1) longest remaining time, 2) progress rate below threshold  Identify worker with its performance score below threshold as slow  Mantri (OSDI’10)  Saving cluster computing resource  Backup up outliers when they show up  Kill-restart when cluster is busy, lazy duplicate when cluster is idle

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Pitfalls in Selecting Slow Tasks  Using average progress rate to identify slow tasks and estimate task remaining time  Hadoop and LATE assumes that:  Tasks of the same type process almost the same amount of input data  Progress rate must be either stable or accelerated during a task ’ s lifetime  There are some scenarios that the assumptions will break down

Input data skew Sort benchmark on 10GB input data following the Zipf distribution ( =1.0)

Phase Percentage varies Different jobs have different phase duration ratio Job in different environments has different phase duration ratio Speed is varying across different phases

Reduce Tasks Start Asynchronously Tasks in different phases can not be compared directly

Take a Long Time to Identify Straggler Cannot identify straggler in time

Pitfalls in Selecting Backup Node  Identifying Slow Worker Node  LATE: Sum of progress of all the completed and running tasks on the node  Hadoop: Average progress rate of all the completed tasks on the node  Some worker nodes may do more time-consuming tasks and get lower performance score unfairly  e.g. doing more tasks with larger amount of data to process or non-local map tasks  Choosing Backup Worker Node  LATE and Hadoop: Ignore data locality  Our observation: a data-local map task can be over three times faster than that of a non-local map task

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Selecting Backup Candidates  Using Per-Phase Process Speed  Dividing each task into multiple phases  Using phase process speed to identify slow tasks and estimate task remaining time MapCombineCopySortReduce Map Task:Reduce Task: Map Combine Map Task compare

Selecting Backup Candidates  Using EWMA to Predict Process Speed

Selecting Backup Candidates  Estimating Task Remaining Time and Backup Time  use the phase average process speed to estimate the remaining time of a phase  To avoid process speed to be fast at the beginning and drop later in copy phase, we estimate the remaining time of copy phase as follows:

Selecting Backup Candidates  Maximizing Cost Performance  Cost: the computing resources occupied by tasks  Performance: the shortening of job execution time and the increase of the cluster throughput  We hope that:  when a cluster is idle, the cost for speculative execution is less a concern  When the cluster is busy, the cost is an important consideration

Selecting Proper Backup Nodes  Assign backup tasks to the fast nodes  How to measure the performance of nodes?  use predicted process bandwidth of data-local map tasks completed on the node to represent its performance  Consider the data-locality  Note: the process speed of data-local map tasks can be 3 times that of non-local map tasks  Therefore, we keep the process speed statistics of data-local, rack- local, and non-local map tasks for each node  For nodes that do not process any map task on a specific locality level, we use the average process speed of all nodes on this level as an estimate  Launch backup on node i ? If remain time > backup time on node i

Summary  A task will be backed up when it meets the following conditions:  it has executed for a certain amount of time (i.e., the speculative lag)  both the progress rate and the process bandwidth in the current phase of the task are sufficiently low  the profit of doing the backup outweighs that of not doing it  its estimated remaining time is longer than the predicted time to finish on a backup node  it has the longest remaining time among all the tasks satisfying the conditions above

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Experiment Environment  Two scale:  Small: 30 virtual machines on 15 physical machines  Large: 100 virtual machines on 30 physical machines  Each physical machine:  dual-Processors (2.4GHz Intel(R) Xeon(R) E5620 processor with 16 logic core), 24GB of RAM and two 150GB disks  Organized in three racks connected by 1Gbps Ethernet  Each virtual machine:  2 virtual core, 4GB RAM and 40GB of disk space  Benchmark:  Sort, Wordcount, Grep, Girdmix

Scheduling in Heterogeneous Environments  Load of each host in heterogeneous environments LoadHostsVMs 1VMs/host33 2VMs/host1122 5VMs/host15 Total1530

Scheduling in Heterogeneous Environments  Working With Different Workloads WorkloadsJob completion time Improvement Cluster throughput Improvement WordCount10%5% Sort19%15% Grep39%38% Gridmix13%15%

Scheduling in Heterogeneous Environments  Analysis (using Word Count and Grep) Strategy PrecisionRecall Average find time MapReduc e MapReduceMapReduce Hadoop- LATE 37.6%3%100% 70s66s Hadoop- MCP 45.2%93.3%87.1%100%56s32s Improvement +Accurate prediction +Cost performance All Execution Speed 27%31%39% Cluster Throughput 29%32%38%

Scheduling in Heterogeneous Environments  Handling Data Skew (Sort) Execution Speed +37% Cluster Throughput +44% Execution Speed +17% Cluster Throughput +19%

 Competing with other applications  Run some I/O intensive processes on some servers  dd process which creates large files in a loop to write random data on some physical machines  MCP can run 36% faster than Hadoop-LATE and increase the cluster throughput by 34%. Scheduling in Heterogeneous Environments

Large scale Experiment  Load distribution  MCP finishes jobs 21% faster than Hadoop-LATE and improves the cluster throughput by 16% LoadHostsVMs 3VMs/host2781 5VMs/host420 Total31101

Scheduling in Homogeneous Environments  Small scale cluster with each host running 2 VMs  There is no straggler node in the cluster  MCP finishes jobs 6% faster than Hadoop-LATE and 2% faster than Hadoop-None.  Hadoop-LATE behaves worse than Hadoop-None due to too many unnecessary reduce backups  MCP improves reduce backup precision by 40%  MCP can achieve better data locality for map tasks

Scheduling Cost  We measure the average time that MCP and Hadoop-LATE spend on speculative scheduling in a job with 350 map tasks and 110 reduce tasks  MCP spends about 0.54ms O(n)  LATE spends 0.74ms O(nlogn)

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Conclusion  We provide an analysis of the pitfalls of current speculative execution strategies in MapReduce  Scenarios: data skew, tasks that start asynchronously, improper configuration of phase percentage etc.  We develop a new strategy MCP to handle these scenarios:  Accurate slow task prediction and remaining time estimation  Take the cost performance of computing resources into account  Take both data locality and data skew into consideration when choosing proper worker nodes  MCP fits well in both heterogeneous and homogeneous environments  handle data skew case well, quite scalable, and less overhead

Thank You!