Omega: flexible, scalable schedulers for large compute clusters

Slides:



Advertisements
Similar presentations
SDN + Storage.
Advertisements

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs.
Charles Reiss *, Alexey Tumanov †, Gregory R. Ganger †, Randy H. Katz *, Michael A. Kozuch ‡ * UC Berkeley† CMU‡ Intel Labs.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Why static is bad! Hadoop Pregel MPI Shared cluster Today: static partitioningWant dynamic sharing.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Cluster Scheduler Reference: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI’2011 Multi-agent Cluster Scheduling for Scalability.
Scheduling a Large DataCenter Cliff Stein Columbia University Google Research June, 2009 Monika Henzinger, Ana Radovanovic Google Research.
Distributed Low-Latency Scheduling
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
A Platform for Fine-Grained Resource Sharing in the Data Center
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Scalable and Coordinated Scheduling for Cloud-Scale computing
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
A Platform for Fine-Grained Resource Sharing in the Data Center
GangLL Gang Scheduling on the IBM SP Andy B. Yoo and Morris A. Jette Lawrence Livermore National Laboratory.
Part III BigData Analysis Tools (YARN) Yuan Xue
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Omega: flexible, scalable schedulers for large compute clusters
Gargamel: A Conflict-Aware Contention Resolution Policy for STM Pierpaolo Cincilla, Marc Shapiro, Sébastien Monnet.
Large-scale cluster management at Google with Borg
TensorFlow– A system for large-scale machine learning
OPERATING SYSTEMS CS 3502 Fall 2017
Packing Tasks with Dependencies
Introduction to Distributed Platforms
Introduction to Load Balancing:
Running Multiple Schedulers in Kubernetes
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Parallel and Distributed Simulation Techniques
MapReduce Simplified Data Processing on Large Cluster
Apache Hadoop YARN: Yet Another Resource Manager
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
PA an Coordinated Memory Caching for Parallel Jobs
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Software Engineering Introduction to Apache Hadoop Map Reduce
Omega: flexible, scalable schedulers for large compute clusters
Chapter 6: CPU Scheduling
Datacenter As a Computer
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
Module 5: CPU Scheduling
Cluster Resource Management: A Scalable Approach
Chapter 17: Database System Architectures
Lecture 21: Introduction to Process Scheduling
Chapter 6: CPU Scheduling
CPU SCHEDULING.
Lecture 21: Introduction to Process Scheduling
Cloud Computing Large-scale Resource Management
Hawk: Hybrid Datacenter Scheduling
Cloud Computing MapReduce in Heterogeneous Environments
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
MapReduce: Simplified Data Processing on Large Clusters
Module 5: CPU Scheduling
Presentation transcript:

Omega: flexible, scalable schedulers for large compute clusters Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes Presented by Muhammed Uluyol *Some graphics from John Wilkes’ CMU presentation EECS 582 – W16

Problem EECS 582 – W16

Problem Heterogeneous Workloads Growing Clusters More Jobs Both long-running services and batch jobs Growing Clusters Can avoid resource fragmentation More Jobs Many Constraints EECS 582 – W16

Workload Heterogeneity Job Count Task Count CPU Requested Memory Requested EECS 582 – W16

Workload Heterogeneity EECS 582 – W16

Workload Heterogeneity EECS 582 – W16

Scheduling Complexity 60+ seconds EECS 582 – W16

Scheduling Requirements Heterogeneous workloads High-utilization Scalability Fault-tolerant User Constraints Availability EECS 582 – W16

Approach: Monolithic Complex Head-of-line blocking Scalability issues EECS 582 – W16

Approach: Static Partitioning Simple Bad utilization Restricted placement EECS 582 – W16

Approach: Two-Level Flexibility: multiple schedulers Pessimistic Resources are “locked” No conflicts Limited placement EECS 582 – W16

Approach: Shared-State Flexibility: multiple schedulers Optimistic No locking Can have conflicts Free placement EECS 582 – W16

Omega Schedulers download copies of cell-state Atomic commits (incremental or full) No central resource manager* EECS 582 – W16

Evaluation (Simulation) Lightweight High-Fidelity Machines Homogeneous Actual Resource request size Sampled Initial Cell state Tasks/Job Job inter-arrival time Task duration Scheduling Constraints Ignored Obeyed Scheduling Algorithm Randomized First-Fit Google Runtime 24 h = 5 min 24 h = 2 h No preemption No failures Resource requests = actual usage Allocations = initially requested size EECS 582 – W16

Lightweight simulator: Monolithic Red = SLO exceeded EECS 582 – W16

Lightweight simulator: Multi-Path Monolithic EECS 582 – W16

Lightweight Simulator: Mesos EECS 582 – W16

Why are there so many failures? “Mesos achieves fairness by alternately offering all available cluster resources to different schedulers” Since service jobs take time to schedule, batch jobs are blocked Scheduling is assumed to be quick Resource churn is expected to be high EECS 582 – W16

Lightweight Simulator: Omega EECS 582 – W16

Scalability EECS 582 – W16

Lightweight Simulator: Summary Similar to monolithic multi-path without head-of-line blocking Does not suffer from heterogeneity issues Scales, but non-linear EECS 582 – W16

High-Fidelity Simulator EECS 582 – W16

Application-Specific Scheduler: MapReduce Engineers are not good at picking worker counts Add MR-specific scheduler Assume linear speedup Add more workers if possible EECS 582 – W16

Application-Specific Scheduler: MapReduce EECS 582 – W16

Wrap-Up Optimistic concurrency Overhead outweighed by benefits Scalability No head-of-line blocking Simplicity Facilitates specialized schedulers EECS 582 – W16

Discussion When do users care about fairness? How well does Omega scale down? Can Mesos be fixed to handle long scheduling times? EECS 582 – W16