UC Berkeley Job Scheduling with the Fair and Capacity Schedulers Matei Zaharia Wednesday, June 10, 2009 Santa Clara Marriott.

Slides:



Advertisements
Similar presentations
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Advertisements

SDN + Storage.
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
© Hortonworks Inc Daniel Dai Thejas Nair Page 1 Making Pig Fly Optimizing Data Processing on Hadoop.
O’Reilly – Hadoop: The Definitive Guide Ch.6 How MapReduce Works 16 July 2010 Taewhi Lee.
UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
O’Reilly – Hadoop: The Definitive Guide Ch.5 Developing a MapReduce Application 2 July 2010 Taewhi Lee.
Wei-Chiu Chuang 10/17/2013 Permission to copy/distribute/adapt the work except the figures which are copyrighted by ACM.
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Resource Management with YARN: YARN Past, Present and Future
Hadoop: The Definitive Guide Chap. 2 MapReduce
Quincy: Fair Scheduling for Distributed Computing Clusters Microsoft Research Silicon Valley SOSP’09 Presented at the Big Data Reading Group by Babu Pillai.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
A Batch Job Queuing System on Clouds with Hadoop and Hbase Presents By Niharika Potharam.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
BOINC The Year in Review David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009.
5: CPU-Scheduling1 Jerry Breecher OPERATING SYSTEMS SCHEDULING.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
Distributed Low-Latency Scheduling
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Distributed and Parallel Processing Technology Chapter6
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人:碩資工一甲 董耀文.
Copyright 2007, Information Builders. Slide 1 Performance and Tuning Tips Mark Nesson/Vashti Ragoonath October 2008.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 5 Operating Systems.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Introduction to Hadoop and HDFS
Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
Congestion Control in CSMA-Based Networks with Inconsistent Channel State V. Gambiroza and E. Knightly Rice Networks Group
Testing the dynamic per-query scheduling (with a FIFO queue) Jan Iwaszkiewicz.
Using Map-reduce to Support MPMD Peng
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica.
LINUX SCHEDULING Evolution in the 2.6 Kernel Kevin Lambert Maulik Mistry Cesar Davila Jeremy Taylor.
Matchmaking: A New MapReduce Scheduling Technique
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,
Virtualization and Databases Ashraf Aboulnaga University of Waterloo.
Using Map-reduce to Support MPMD Peng
Software-defined network(SDN)
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Hadoop Architecture Mr. Sriram
Spark and YARN: Better Together
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
Chapter 10 Data Analytics for IoT
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Managing Data Transfer in Computer Clusters with Orchestra
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Process Design and Analysis
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
PA an Coordinated Memory Caching for Parallel Jobs
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Latest June Cloudera CCA175 Exam Questions - CCA175 Exam Dumps PDF
Get CCA-500 Dumps PDF - CCA-500 Exam Dumps Study Material Dumps4download.us
2018 Valid Cloudera CCA175 Dumps Questions - CCA175 Braindumps Dumpsprofessor.com
湖南大学-信息科学与工程学院-计算机与科学系
Apollo Weize Sun Feb.17th, 2017.
Introduction to Apache
CPU SCHEDULING.
Presentation transcript:

UC Berkeley Job Scheduling with the Fair and Capacity Schedulers Matei Zaharia Wednesday, June 10, 2009 Santa Clara Marriott

Motivation » Provide fast response times to small jobs in a shared Hadoop cluster » Improve utilization and data locality over separate clusters and Hadoop on Demand

Hadoop at Facebook » 600-node cluster running Hive » 3200 jobs/day » 50+ users » Apps: statistical reports, spam detection, ad optimization, …

Facebook Job Types » Production jobs: data import, hourly reports, etc » Small ad-hoc jobs: Hive queries, sampling » Long experimental jobs: machine learning, etc  GOAL: fast response times for small jobs, guaranteed service levels for production jobs

Outline » Fair scheduler basics » Configuring the fair scheduler » Capacity scheduler » Useful links

FIFO Scheduling Job Queue

FIFO Scheduling Job Queue

FIFO Scheduling Job Queue

Fair Scheduling Job Queue

Fair Scheduling Job Queue

Fair Scheduler Basics » Group jobs into “ pools” » Assign each pool a guaranteed minimum share » Divide excess capacity evenly between pools

Pools » Determined from a configurable job property › Default in 0.20: user.name (one pool per user) » Pools have properties: › Minimum map slots › Minimum reduce slots › Limit on # of running jobs

Example Pool Allocations entire cluster 100 slots matei jeff ads min share = 40 ads min share = 40 tom min share = 30 tom min share = 30 job 2 15 slots job 2 15 slots job 3 15 slots job 3 15 slots job 1 30 slots job 1 30 slots job 4 40 slots job 4 40 slots

Scheduling Algorithm » Split each pool’s min share among its jobs » Split each pool’s total share among its jobs » When a slot needs to be assigned: › If there is any job below its min share, schedule it › Else schedule the job that we’ve been most unfair to (based on “deficit”)

Scheduler Dashboard

Change priority Change pool FIFO mode (for testing)

Additional Features » Weights for unequal sharing: › Job weights based on priority (each level = 2x) › Job weights based on size › Pool weights » Limits for # of running jobs: › Per user › Per pool

Installing the Fair Scheduler » Build it: ›ant package » Place it on the classpath: ›cp build/contrib/fairscheduler/*.jar lib

Configuration Files » Hadoop config (conf/mapred-site.xml) › Contains scheduler options, pointer to pools file » Pools file (pools.xml) › Contains min share allocations and limits on pools › Reloaded every 15 seconds at runtime

Minimal hadoop-site.xml mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.FairScheduler mapred.fairscheduler.allocation.file /path/to/pools.xml

Minimal pools.xml

Configuring a Pool 10 5

Setting Running Job Limits

Default Per-User Running Job Limit

Other Parameters mapred.fairscheduler.assignmultiple: » Assign a map and a reduce on each heartbeat; improves ramp-up speed and throughput; recommendation: set to true

Other Parameters mapred.fairscheduler.poolnameproperty: » Which JobConf property sets what pool a job is in - Default: user.name (one pool per user) - Can make up your own, e.g. “pool.name”, and pass in JobConf with conf.set(“pool.name”, “mypool”)

Useful Setting mapred.fairscheduler.poolnameproperty pool.name pool.name ${user.name} Make pool.name default to user.name

Future Plans » Preemption (killing tasks) if a job is starved of its min or fair share for some time (HADOOP-4665) » Global scheduling optimization (HADOOP-4667) » FIFO pools (HADOOP-4803, HADOOP-5186)

Capacity Scheduler » Organizes jobs into queues » Queue shares as %’s of cluster » FIFO scheduling within each queue » Supports preemption » city_scheduler.html city_scheduler.html

Thanks! » Fair scheduler included in Hadoop and in Cloudera’s Distribution for Hadoop » Fair scheduler for Hadoop 0.17 and 0.18: » Capacity scheduler included in Hadoop » Docs: » My