Quincy: Fair Scheduling for Distributed Computing Clusters Microsoft Research Silicon Valley SOSP’09 Presented at the Big Data Reading Group by Babu Pillai.

Slides:



Advertisements
Similar presentations
Big Data + SDN SDN Abstractions. The Story Thus Far Different types of traffic in clusters Background Traffic – Bulk transfers – Control messages Active.
Advertisements

Can’t We All Just Get Along? Sandy Ryza. Introductions Software engineer at Cloudera MapReduce, YARN, Resource management Hadoop committer.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
EDA (CS286.5b) Day 10 Scheduling (Intro Branch-and-Bound)
Operating Systems Process Scheduling (Ch 3.2, )
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
1 CMSC421: Principles of Operating Systems Nilanjan Banerjee Principles of Operating Systems Acknowledgments: Some of the slides are adapted from Prof.
Why static is bad! Hadoop Pregel MPI Shared cluster Today: static partitioningWant dynamic sharing.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
Uniprocessor Scheduling Chapter 9. Aim of Scheduling The key to multiprogramming is scheduling Scheduling is done to meet the goals of –Response time.
Informationsteknologi Tuesday, October 9, 2007Computer Systems/Operating Systems - Class 141 Today’s class Scheduling.
Slide 6-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 6 Threads and Scheduling 6.
Distributed Low-Latency Scheduling
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
 Scheduling  Linux Scheduling  Linux Scheduling Policy  Classification Of Processes In Linux  Linux Scheduling Classes  Process States In Linux.
A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人:碩資工一甲 董耀文.
CPU Scheduling Chapter 6 Chapter 6.
Scheduling. Objectives – Fairness – Maximize throughput – Maximize the number of users receiving acceptable response times – Minimize overhead – Balance.
Chapter 4 Processor Management
Chapter 6 CPU SCHEDULING.
MM Process Management Karrie Karahalios Spring 2007 (based off slides created by Brian Bailey)
 Escalonamento e Migração de Recursos e Balanceamento de carga Carlos Ferrão Lopes nº M6935 Bruno Simões nº M6082 Celina Alexandre nº M6807.
Scheduling in Cloud Presented by: Abdullah Al Mahmud Course: Cloud Computing(Fall 2012)
1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
1 Our focus  scheduling a single CPU among all the processes in the system  Key Criteria: Maximize CPU utilization Maximize throughput Minimize waiting.
1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit:
ICOM Noack Scheduling For Distributed Systems Classification – degree of coupling Classification – granularity Local vs centralized scheduling Methods.
Lecture 7: Scheduling preemptive/non-preemptive scheduler CPU bursts
Using Map-reduce to Support MPMD Peng
Matchmaking: A New MapReduce Scheduling Technique
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Static Process Scheduling
A Two-phase Execution Engine of Reduce Tasks In Hadoop MapReduce XiaohongZhang*GuoweiWang* ZijingYang*YangDing School of Computer Science and Technology.
Using Map-reduce to Support MPMD Peng
Scheduling. Objectives – Fairness – Maximize throughput – Maximize the number of users receiving acceptable response times – Minimize overhead – Balance.
Next Generation of Apache Hadoop MapReduce Owen
Presented by Qifan Pu With many slides from Ali’s NSDI talk Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica.
Operating Systems Scheduling. Scheduling Short term scheduler (CPU Scheduler) –Whenever the CPU becomes idle, a process must be selected for execution.
Part III BigData Analysis Tools (YARN) Yuan Xue
Scheduling.
CPU Scheduling Scheduling processes (or kernel-level threads) onto the cpu is one of the most important OS functions. The cpu is an expensive resource.
Multiprogramming. Readings r Chapter 2.1 of the textbook.
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Yarn.
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Conception of parallel algorithms
Processes and Threads Processes and their scheduling
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Dynamic Graph Partitioning Algorithm
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
PA an Coordinated Memory Caching for Parallel Jobs
Synchronizing Processes
Software Engineering Introduction to Apache Hadoop Map Reduce
Distributed Systems CS
The Basics of Apache Hadoop
Chapter 5: CPU Scheduling
Apollo Weize Sun Feb.17th, 2017.
OverView of Scheduling
Chapter 2: The Linux System Part 3
Operating systems Process scheduling.
Distributed Systems CS
CS703 – Advanced Operating Systems
Towards Predictable Datacenter Networks
Presentation transcript:

Quincy: Fair Scheduling for Distributed Computing Clusters Microsoft Research Silicon Valley SOSP’09 Presented at the Big Data Reading Group by Babu Pillai

Motivation Big clusters used for jobs of varying sizes, durations Data from production search MS Google: 395s avg. time for Map Reduce job

Goal Fine grained sharing of resources, not semi- static allocations Comply with locality constraints – move computation to the data Ensure a notion of fairness – large job should not monopolize cluster at expense of many small jobs

Job Model Dryad jobs – root process + set of worker processes, related by a DAG loosely coupled, independently executing processes Model is compatible with Map-Reduce, Hadoop, Condor Model will not work for MPI Will NOT work for SLIPstream jobs May work for Dicer??

Locality Cluster modeled as 2 level hierarchy: racks and machines Each worker can have a rack or machine preference based on fraction of data on rack/machine Preference is late bound – when process ready to run

Fairness If a job takes t time, and J jobs are running, then it should take no more than Jt Admission control to limit to K concurrent jobs  choice trades off fairness w/locality against avoiding idle resources Queue others, admit in FIFO order as jobs complete Fine grained sharing of whole cluster Assumes only one process per machine

Baseline: Queue-based scheduling Queues for each machine, rack, and system-wide Greedy: dispatch from most local queue to keep all machines occupied Greedy Fair: “block” jobs with lots of resources, give priority to unblocked ones Greedy Fair Preemptive: shoot down processes of jobs with more than their fair share, shortest-lived first Hysteresis can be added to GF, GFP to avoid sticky slot issue

Big Idea: Graph-based scheduling Represent cluster and process set as a graph Add weights to represent preferences, costs Use solver to convert this to a set of scheduling assignments that satisfy global criteria

Quincy Fair Scheduling Encode cluster structure, job tasks as flow graph Edge costs encode policy Cost of tasks to particular machines, racks based on data transfer costs Cost of tasks into U based on time cost of not running Cost out of U can control relative fairness among jobs Time varying costs: when already scheduled, costs of other tasks coming into a compute node go up; cost to U goes up over time Parameterize with 3 variables: cost of waiting in queue, cost of data transfer over rack, core switch

Results Unrestricted network case

Results Restricted network – much lower bw through core switch

Issues / Discussion Assignment of tasks independent – this is not possible for MPI, many VM placement Single dimensional capacities, not easily extended to multidimensional Can support multiple tasks/computer Can adapt K to improve cluster utilization Cannot express constraints such as t1 and t2 should execute on the same rack