Cloud Computing Large-scale Resource Management

Slides:



Advertisements
Similar presentations
MapReduce: Simplified Data Processing on Large Cluster Jeffrey Dean and Sanjay Ghemawat OSDI 2004 Presented by Long Kai and Philbert Lin.
Advertisements

The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.
THE DATACENTER NEEDS AN OPERATING SYSTEM MATEI ZAHARIA, BENJAMIN HINDMAN, ANDY KONWINSKI, ALI GHODSI, ANTHONY JOSEPH, RANDY KATZ, SCOTT SHENKER, ION STOICA.
Spark: Cluster Computing with Working Sets
The "Almost Perfect" Paper -Christopher Francis Hyma Chilukiri.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Why static is bad! Hadoop Pregel MPI Shared cluster Today: static partitioningWant dynamic sharing.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Cluster Scheduler Reference: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI’2011 Multi-agent Cluster Scheduling for Scalability.
Distributed Low-Latency Scheduling
A Platform for Fine-Grained Resource Sharing in the Data Center
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
A Platform for Fine-Grained Resource Sharing in the Data Center
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Presented by Qifan Pu With many slides from Ali’s NSDI talk Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Omega: flexible, scalable schedulers for large compute clusters
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Map reduce Cs 595 Lecture 11.
COS 518: Advanced Computer Systems Lecture 13 Michael Freedman
Mesos and Borg (Lecture 17, cs262a)
COS 418: Distributed Systems Lecture 23 Michael Freedman
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
Running Multiple Schedulers in Kubernetes
Hadoop MapReduce Framework
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Apache Hadoop YARN: Yet Another Resource Manager
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
PA an Coordinated Memory Caching for Parallel Jobs
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
INDIGO – DataCloud PaaS
Hadoop Clusters Tess Fulkerson.
Software Engineering Introduction to Apache Hadoop Map Reduce
Omega: flexible, scalable schedulers for large compute clusters
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
Omega: flexible, scalable schedulers for large compute clusters
COS 518: Advanced Computer Systems Lecture 14 Michael Freedman
HPML Conference, Lyon, Sept 2018
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Hawk: Hybrid Datacenter Scheduling
Cloud Computing MapReduce in Heterogeneous Environments
Job-aware Scheduling in Eagle: Divide and Stick to Your Probes
COS 518: Distributed Systems Lecture 11 Mike Freedman
Presentation transcript:

Cloud Computing Large-scale Resource Management LOOK at the exam questions Eva Kalyvianaki ek264@cam.ac.uk

Contents Mesos: A Platform for Fine-Grained Resource Sharing in Data Center by B. Hindman, A. Knwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, I. Stoica from University of California, Berkeley, NSDI 2011. Omega: flexible, scalable schedulers for large compute clusters by Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek and John Wilkes, EuroSys 2013.

Background Data centers are built from clusters of commodity hardware These clusters run a diverse set of applications: e.g., MapReduce, Data Streaming, Storm, S4, etc Each application has its own execution framework Multiplexing a cluster between frameworks improves resources utilisation and so reduces costs

Problem Statement It is very difficult and challenging for a single framework (application-specific) to efficiently manage the resources of clusters The aim of the Mesos work is to be able to run multiple frameworks in a single cluster in order to: maximise utilisation share data between frameworks

Common solutions for sharing clusters Statically partition the cluster and run one framework (or VMs) per partition. Disadvantages: Low utilisation Rigid partitioning which might not match the real-time changing application demands Mismatch between the allocation granularity of static partitioning and current application frameworks such as Hadoop.

Mesos Challenges and Goals: High utilisation Support diverse frameworks: each framework will have different scheduling needs Scalability: the scheduling system must scale to clusters of 1,000s of nodes running 100s of jobs with 1,000,000s of tasks Reliability: because all applications would depend on Mesos, the system must be fault-tolerant and highly available. Mesos: a thin resource sharing layer that enables fine- grained sharing across diverse cluster computing frameworks, by giving frameworks a common interface for accessing cluster resources.

Design Elements Fine-grained sharing: Resource offers: Allocation at the level of tasks within a job Improves utilisation, latency and data locality Resource offers: Offer available resources to frameworks Simple, scalable application controlled scheduling mechanism

Design Element 1, Fine-Grained Sharing

Design Element 2, Resource Offers Mesos proposes and uses Resource Offers: Offer available resources to frameworks and let them pick which resources to use and which tasks to launch Advantage: keeps Mesos simple, expandable to future frameworks Disadvantage: Decentralised solutions are not optimal

Mesos Architecture

Mesos Architecture, Terminology Mesos master: fine-grained sharing across frameworks Mesos slave on each node Frameworks that run tasks on slave nodes Framework schedulers “While the Mesos master determines how many resources to offer to each framework, the frameworks’ schedulers select which of the offered resources to use”.

Mesos Resource Offer Example Frameworks can also reject offers so that Mesos is kept simple. HALF-WAY IN PRESENTATION

Mesos Resource Offers Frameworks can reject offers when these do not satisfy their constraints in order to wait for ones that do. But, a framework might have to wait for a long time until it receives an offer that satisfies its constraints and Mesos might be sending these offers to many frameworks  time consuming. So, Mesos frameworks set their own filters; they specify offers that will always be rejected.

Mesos Architecture Details Resource Allocation Module Fair Sharing based on a generalisation of max-min fairness for multiple resources Strict priorities Mesos can also (revoke) kill tasks when a greedy framework uses up lots of resources for a long time. Mesos allows the framework a greedy period. Isolation is achieved using Linux Containers

Mesos Architecture Details (con’t) Scalable and Robust Resource Offers Use of filters: “only offer nodes from list L”, “only offer nodes with at least R resources free”, these can be evaluated quickly Mesos counts resources offered to a framework towards its allocation of the cluster If a framework takes a long time to respond, Mesos withdraws the offers and asks another framework

Mesos Evaluation Amazon EC2 with 92 Mesos nodes A mix of Frameworks: Hadoop running a mix of small and large jobs based on a Facebook workload A Hadoop instance running a set of large batch jobs Spark running machine learning jobs Torque running MPI jobs Mesos should: achieve higher utilisation and all jobs should finish at least as fast as in the static partitioning

Mesos Performance For all Frameworks At time 350 when both Spark and Facebook Hadoop have no running jobs, the large job-hadoop framework scales up a lot.

Mesos Performance vs Static Partitioning

Omega: flexible, scalable schedulers for large compute clusters Challenges: Google maintains different data centers around the world Clusters and workloads are growing in size Diverse workloads Growing job arrival rates Need for a scalable scheduler to tackle these challenges

Existing Approaches

Existing Approach, Mesos revised Disadvantages: Pessimistic Concurrency Control: Mesos avoids conflicts by offering a given resource to one framework at-a-time. Mesos chooses the order and the sizes of the offers. One framework holds the lock of the resources for the duration of the decision  slow. A framework does not have access to all the cluster state. So, it cannot support preemption or getting all resources. Mesos is good when tasks are short-lived and relinguish resources frequently and when job sizes are small compared to the size of the cluster. But, google cluster workloads do not have these properties and so these approaches do not work well

Omega: Shared State, Overview Cell State: a resilient master copy of the resource allocations of the cluster No centralised scheduler. Each framework maintains its own scheduler. Each scheduler has its own private, local, frequently-updated copy of the cell state which uses to make decisions. Cell state:

Omega Shared-state Scheduling, Steps A framework’s scheduler makes a decision Updates the shared cell state copy in an atomic commit At most one such commit will succeed The scheduler resyncs its local copy and if needed re- schedules

Omega Scheduling, Remarks Omega schedulers operation in parallel Schedulers typically do incremental transactions to avoid starvation Schedulers agree on common scale for expressing the relative importance of jobs Performance viability of the shared-state approach is determined by the frequency at which transactions fail and their costs!!! “Our performance evaluation of the Omega model using both lightweight simulations with synthetic workloads, and high- fidelity, trace-based simulations of production workloads at Google, shows that optimistic concurrency over shared state is a viable, attractive approach to cluster scheduling. “

Material Mesos: A Platform for Fine-Grained Resource Sharing in Data Center by B. Hindman, A. Knwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, I. Stoica from University of California, Berkeley, NSDI 2011. Sections 1-3.5, 6-6.1.2 Omega: flexible, scalable schedulers for large compute clusters By Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek and John Wilkes, EuroSys 2013. Sections 1-3, 8