Using Map-reduce to Support MPMD Peng

Slides:

Advertisements

Similar presentations

Advertisements

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.

Resource Management with YARN: YARN Past, Present and Future

UC Berkeley Job Scheduling with the Fair and Capacity Schedulers Matei Zaharia Wednesday, June 10, 2009 Santa Clara Marriott.

Quincy: Fair Scheduling for Distributed Computing Clusters Microsoft Research Silicon Valley SOSP’09 Presented at the Big Data Reading Group by Babu Pillai.

A Batch Job Queuing System on Clouds with Hadoop and Hbase Presents By Niharika Potharam.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.

CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.

VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.

GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks Íñigo Goiri, Kien Le, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.

MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.

A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人：碩資工一甲董耀文.

OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.

Introduction to MapReduce ECE7610. The Age of Big-Data  Big-data age  Facebook collects 500 terabytes a day(2011)  Google collects 20000PB a day (2011)

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read

October 19, 2005Charm++ Workshop, Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC.

CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.

MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,

임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

Fair Queueing. 2 First-Come-First Served (FIFO) Packets are transmitted in the order of their arrival Advantage: –Very simple to implement Disadvantage:

GreenSched: An Energy-Aware Hadoop Workflow Scheduler

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.

Matchmaking: A New MapReduce Scheduling Technique

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.

Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Dzmitry Kliazovich University of Luxembourg, Luxembourg

Grid Appliance The World of Virtual Resource Sharing Group # 14 Dhairya Gala Priyank Shah.

A Two-phase Execution Engine of Reduce Tasks In Hadoop MapReduce XiaohongZhang*GuoweiWang* ZijingYang*YangDing School of Computer Science and Technology.

Using Map-reduce to Support MPMD Peng

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 3.

Part III BigData Analysis Tools (YARN) Yuan Xue

INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.

1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.

Prediction-Based Multivariate Query Modeling Analytic Queries.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1

How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.

Chapter 10 Data Analytics for IoT

Hadoop MapReduce Framework

Edinburgh Napier University

CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016

CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017

PA an Coordinated Memory Caching for Parallel Jobs

湖南大学-信息科学与工程学院-计算机与科学系

MapReduce: Data Distribution for Reduce

Apollo Weize Sun Feb.17th, 2017.

Symmetric Multiprocessing (SMP)

Chapter 2: The Linux System Part 3

Cloud Computing: Project Tutorial Hadoop Map-Reduce Programming

Multithreaded Programming

PU. Setting up parallel universe in your pool and when (not

Presentation transcript:

Using Map-reduce to Support MPMD Peng

Job Scheduling in Hadoop The default job scheduler in Hadoop has a first-in-first-out queue of jobs for each priority level. The scheduler always assigns task slots to the first job in the highest-level priority queue that is in need of tasks. This makes it difficult to share a MapReduce cluster between users because a large job will starve subsequent jobs in its queue, but at the same time, giving lower priorities to large jobs would cause them to be starved by a stream of higher-priority jobs. One solution to this problem is to create separate MapReduce clusters for different user groups with Hadoop On-Demand, but this hurts system utilization because a group's cluster may be mostly idle for long periods of time.

Facebook Fair Scheduler Jobs are placed into named “pools. Each pool can have a “guaranteed capacity” that is specified through a config file, which gives a minimum number of map slots and reduce slots to allocate to the pool. When there are pending jobs in the pool, it gets at least this many slots, but if it has no jobs, the slots can be used by other pools. Excess capacity that is not going toward a pool’s minimum is allocated between jobs using fair sharing. – Fair sharing splits up compute time proportionally between jobs that have been submitted, emulating an "ideal" scheduler that gives each job 1/Nth of the available capacity.

Yahoo Capacity Scheduler Define a number of named queues. Each queue has a configurable number of map and reduce slots. The scheduler gives each queue its capacity when it contains jobs, and shares any unused capacity between the queues. However, within each queue, FIFO scheduling with priorities is used, except for one aspect – you can place a limit on percent of running tasks per user, so that users share a cluster equally.

Our solution Turning Hadoop into MPMD (computation resource sharing): – Different users can submit multiple tasks which will be assigned to different mappers/reducers and run simultaneously. – Load balancing achieved by keeping the computing nodes busy with tasks

Two categories of MIMD Single Program Multiple Data (SPMD) [1] – Multiple autonomous processors simultaneously executing the same program (but at independent points, rather than in the lockstep that SIMD imposes) on different data. Multiple Program Multiple Data (MPMD) [1] – Multiple autonomous processors simultaneously operating at least 2 independent programs.

Traditional Map-reduce follows SPMD Same Program Multiple Data

Using the traditional Map-reduce to support MPMD Data 1 Data 2 Data 3 …… Data n executer …… executer Output 1 Output 2 …… Output n Output Same Execution Environment Multiple Program Program Lookup Server executer

Deliverable We are going to run several map-reduce job in parallel: – WordCount – HadoopBlas

Schedule 1 week – Discuss on how to over come the challenges 2 weeks – Develop the MPMD hadoop environment – Adapt wordcount and hadoop blast to MPMD 1 week – Flexible time

References [1] [2] [3] [4]

Roles of team member Peng – Implementing the framework Yuan – Adapting the Wordcount and hadoop blast to our framework

Q&A Thanks!