MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Slides:

Advertisements

Similar presentations

Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)

Advertisements

University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra

Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

MapReduce Online Veli Hasanov Fatih University.

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.

Software Engineering CSE470: Process 15 Software Engineering Phases Definition: What? Development: How? Maintenance: Managing change Umbrella Activities:

SkewTune: Mitigating Skew in MapReduce Applications

UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Adaptive Scheduling with QoS Satisfaction in Hybrid Cloud Environment 研究生：李羿慷指導老師：張玉山老師.

A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter ： S.Y.Chen.

Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern.

Ivory : Ivory : Pairwise Document Similarity in Large Collection with MapReduce Tamer Elsayed, Jimmy Lin, and Doug Oard Laboratory for Computational Linguistics.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.

CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.

A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人：碩資工一甲董耀文.

Dillon: CSE470: SE, Process1 Software Engineering Phases l Definition: What? l Development: How? l Maintenance: Managing change l Umbrella Activities:

 Escalonamento e Migração de Recursos e Balanceamento de carga Carlos Ferrão Lopes nº M6935 Bruno Simões nº M6082 Celina Alexandre nº M6807.

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve ， Devendra Dahiphale ， Amit Chhajer 報告 : 饒展榕.

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010.

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.

1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.

Digital Intuition Cluster, Smart Geometry 2013, Stylianos Dritsas, Mirco Becker, David Kosdruy, Juan Subercaseaux Welcome Notes Overview 1. Perspective.

Using Map-reduce to Support MPMD Peng

Record Linkage in a Distributed Environment

Matchmaking: A New MapReduce Scheduling Technique

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can

A Hyper-heuristic for scheduling independent jobs in Computational Grids Author: Juan Antonio Gonzalez Sanchez Coauthors: Maria Serna and Fatos Xhafa.

Shanjiang Tang, Bu-Sung Lee, Bingsheng He, Haikun Liu School of Computer Engineering Nanyang Technological University Long-Term Resource Fairness Towards.

Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Web Log Data Analytics with Hadoop

ApproxHadoop Bringing Approximations to MapReduce Frameworks

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.

Using Map-reduce to Support MPMD Peng

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.

Part III BigData Analysis Tools (YARN) Yuan Xue

Prediction-Based Multivariate Query Modeling Analytic Queries.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

A Study of Data Partitioning on OpenCL-based FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) 1.

By Chris immanuel, Heym Kumar, Sai janani, Susmitha

Introduction to Load Balancing:

Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1

Chapter 10 Data Analytics for IoT

CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016

CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017

PA an Coordinated Memory Caching for Parallel Jobs

Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3

Adaptive Data Refinement for Parallel Dynamic Programming Applications

Presentation transcript:

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013 Shanjiang Tang, Bu-Sung Lee, Bingsheng He 1

OutLine Background & Motivations MROrder Evaluation Conclusion 2

MapReduce Computation Model Map Intermediate Result Map Reduce Output Result Reduce Output Result Reduce Output Result Reduce Output Result Final Result Map-Phase Computation Reduce-Phase Computation Input Data 3

Hadoop Execution Model Hadoop is an open-source implementation of MapReduce Model. The cluster computation resources are divided into map slots and reduce slots, which are configured by Hadoop administrator in advance. A MapReduce job generally consists of map tasks and reduce tasks. Map tasks have to be allocated with map slots, and reduce tasks have to be allocated with reduce slots. 4

Hadoop Execution Model 5 Map slotsReduce slots Map tasks start before reduce tasks Map tasks can only run on map slots, reduce tasks can only run on reduce slots

Job Order VS Performance Implication: Different Job orders have a significant impact on performance results!!! Map Phase : Reduce Phase : Map Phase : Reduce Phase : 6 time

Our Goals Job ordering Optimization is a non-trivial approach to improve the performance of MapReduce workloads ( i.e., a batch of MapReduce jobs). Our work focuses on job ordering optimization for online MapReduce workloads under FIFO scheduler, where jobs arriving over time. Different performance metrics are considered, e.g., makespan, total completion time. 7

OutLine Background & Motivations MROrder Evaluation Conclusion 8

Architecture Overview of MROrder 9

Policy Module Determine when and how to perform job ordering optimization for MapReduce jobs. We provide two alternative solutions for determine when to perform job ordering optimization:  PNJ-Dominated Solution. performs job ordering when the number of jobs in the queue reaches to a threshold, i.e.,.  TP-Dominated Solution. invokes periodically after a time interval. Notes: PNJ -- policy for the number of job. TP – time-based policy. 10

Policy Module TP-Dominated solution:  TP-Dominated Solution with Fixed Time Interval (TP-FTI). perform job ordering periodically within fixed time interval  TP-Dominated Solution with Adaptive Time Interval (TP-ATI). perform job ordering dynamically with adaptive time interval, based on the estimated running time of workloads. 11

TP-FTI 12

TP-ATI 13

Ordering Engine Responsible for performing job ordering optimization. Two types of job ordering approaches:  Simulation-based Ordering Approach (SIM). we develop a Hadoop simulator Hsim to look for optimal results. It is a brute-force method.  Algorithm-based Ordering Approach (ALG). we provide efficient heuristic job ordering algorithms for different performance metrics, e.g., makespan, total completion time. 14

ALG for Makespan

ALG for Total Completion Time

OutLine Background & Motivations MROrder Evaluation Conclusion 17

Experiment Setup Enviroments  A Hadoop cluster consisting of 10 nodes, each with two Intel X5675 CPUs, 24GB memory and 56GB hard disks. Workloads  Synthetic Facebook Workload. we generated it based on previously related work. Most of jobs are small-size, aiming to use it to evaluate the total completion time.  Tested Workload. Most of its jobs are large-size, we use it to evaluate the makespan. 18

TP-FTI VS TP-ATI TP-ATI is smarter and works better than TP-FTI ! 19 Δt : the suitable threshold of time period for time-based policy. PITCT: performance improvement of total completion time.

ALG VS SIM 20 SIM performs better than ALG, but consumes more time especially when the number of jobs are large.

Performance Improvement by MROrder (Simulation Result) 21 Total Completion Time is sensitive to the small-size dominated jobs !

Performance Improvement by MROrder (Real Experiment Result) 22 Makespan is sensitive to the large-size dominated jobs !

OutLine Background & Motivations MROrder Evaluation Conclusion 23

Conclusion Job ordering optimization is a non-trivial method to improve the efficiency of slots resource utilization and perform of MapReduce workloads. MROrder is a prototype system for online MapReduce workloads, being flexible for various performance metrics. Experimental results show that MROrder improves the performance of MapReduce workloads significantly. The source code of MROrder is available at: 24

Ongoing and Future Work Integrating MROrder into Hadoop system. Considering the performance improvement for other schedulers, e.g., Hadoop Fair Scheduler, Capacity Scheduler. Exploring other alternative approaches to improve the cluster utilization and performance of MapReduce workloads. 25

Acknowledgement This work is supported by the ”User and Domain driven data analytics as a Service framework” project under the A*STAR Thematic Strategic Research Programme (SERC Grant No ). 26

27

Accuracy Evaluation of HSim 28

Impact of Inaccuracy in Estimated Map/Reduce Tasks Time 29