Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Slides:



Advertisements
Similar presentations
MapReduce: Simplified Data Processing on Large Cluster Jeffrey Dean and Sanjay Ghemawat OSDI 2004 Presented by Long Kai and Philbert Lin.
Advertisements

The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
UC Berkeley a Spark in the cloud iterative and interactive cluster computing Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.
THE DATACENTER NEEDS AN OPERATING SYSTEM MATEI ZAHARIA, BENJAMIN HINDMAN, ANDY KONWINSKI, ALI GHODSI, ANTHONY JOSEPH, RANDY KATZ, SCOTT SHENKER, ION STOICA.
Resource Management with YARN: YARN Past, Present and Future
The "Almost Perfect" Paper -Christopher Francis Hyma Chilukiri.
Why static is bad! Hadoop Pregel MPI Shared cluster Today: static partitioningWant dynamic sharing.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
UC Berkeley Monitoring Hadoop through Tracing Andy Konwinski and Matei Zaharia.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Cluster Scheduler Reference: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI’2011 Multi-agent Cluster Scheduling for Scalability.
Outline | Motivation| Design | Results| Status| Future
Distributed Low-Latency Scheduling
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
A Platform for Fine-Grained Resource Sharing in the Data Center
CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing April 29, 2013 Anthony D. Joseph
Benjamin Hindman Apache Mesos Design Decisions
1 CS : Project Suggestions Ion Stoica ( September 14, 2011.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Cloud Computing Ali Ghodsi UC Berkeley, AMPLab
CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing December 2, 2013 Anthony D. Joseph and John Canny
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Tachyon: memory-speed data sharing Haoyuan (HY) Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, Ion Stoica Good morning everyone. My name is Haoyuan,
(Private) Cloud Computing with Mesos at Twitter Benjamin
A Proposal of Application Failure Detection and Recovery in the Grid Marian Bubak 1,2, Tomasz Szepieniec 2, Marcin Radecki 2 1 Institute of Computer Science,
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
A Platform for Fine-Grained Resource Sharing in the Data Center
Next Generation of Apache Hadoop MapReduce Owen
Presented by Qifan Pu With many slides from Ali’s NSDI talk Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica.
Apache Mesos What is it ? Beyond Hadoop Resource Sharing Mesos Intentions Architecture Users
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
Part III BigData Analysis Tools (YARN) Yuan Xue
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
COS 518: Advanced Computer Systems Lecture 13 Michael Freedman
Mesos and Borg (Lecture 17, cs262a)
Scalable containers with Apache Mesos and DC/OS
COS 418: Distributed Systems Lecture 23 Michael Freedman
Introduction to Distributed Platforms
Managing Data Transfer in Computer Clusters with Orchestra
Apache Hadoop YARN: Yet Another Resource Manager
PA an Coordinated Memory Caching for Parallel Jobs
INDIGO – DataCloud PaaS
Software Engineering Introduction to Apache Hadoop Map Reduce
Sajitha Naduvil-vadukootu
Distributed Systems CS
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
Omega: flexible, scalable schedulers for large compute clusters
Mesos and Borg and Kubernetes Lecture 13, cs262a
COS 518: Advanced Computer Systems Lecture 14 Michael Freedman
Distributed Systems CS
Cloud Computing Large-scale Resource Management
Cloud Computing MapReduce in Heterogeneous Environments
COS 518: Distributed Systems Lecture 11 Mike Freedman
Presentation transcript:

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica Presented by Muhammed Uluyol EECS 582 – W16

Problem Many frameworks How to share resources? Better utilization Low operational cost EECS 582 – W16

Status Quo Spark Hadoop 1.x Hadoop 2.x EECS 582 – W16

Status Quo Underutilized (resource fragmentation) Suboptimal placement Locality Failure tolerance Spark Hadoop 1.x Hadoop 2.x EECS 582 – W16

Goal Hadoop 1.x Hadoop 2.x Spark EECS 582 – W16

Goal Mesos Hadoop 1.x Hadoop 2.x Spark EECS 582 – W16

Design Goals High Utilization Framework agnostic Scalability (10,000s of nodes) Fault-tolerant EECS 582 – W16

Scheduling Potential Solution: Global Scheduler Frameworks specify requirements Pros: Obvious solution Can make good placement decisions across all tasks Cons: Complex How to scale? How to address future needs? EECS 582 – W16

Resource Offers Present available resources to framework schedulers Scheduler picks which resources to use for a job Pros: Minimal work done in Mesos Cons: Schedulers unaware of each other’s jobs leads to suboptimal decisions EECS 582 – W16

Architecture Mesos Slave Hadoop Scheduler Mesos Master MPI Scheduler Hadoop Executor Task MPI Executor Hadoop Scheduler Mesos Master MPI Scheduler Jobs Allocation Module Mesos Slave Hadoop Executor Task Spark Executor Spark Scheduler EECS 582 – W16

Allocation Modules Pluggable Wrote two: Expects tasks to be short Fair Sharing Priority Expects tasks to be short Long tasks can be killed EECS 582 – W16

Isolation Use OS-provided mechanisms E.g. containers on Linux EECS 582 – W16

Resource Offers Count towards usage until a selection is made Offer are taken back after timeouts EECS 582 – W16

Fault-Tolerance Only need Store in ZooKeeper Active Slaves Active Frameworks Running Tasks Store in ZooKeeper Framework schedulers need to provide fault-tolerance EECS 582 – W16

Evaluation: Sharing EECS 582 – W16

Evaluation: Utilization EECS 582 – W16

Evaluation: Mesos vs Static Partitioning Framework Speedup on Mesos Facebook Hadoop Mix 1.14 Large Hadoop Mix 2.10 Spark 1.26 Torque/MPI 0.96 EECS 582 – W16

Evaluation: Scalability EECS 582 – W16

Wrap Up Mesos shares machines across different cluster compute frameworks Fine-grained sharing enables better locality and utilization Resource offers are a scalable, decentralized approach for application-controlled schedling EECS 582 – W16

Discussion What happens if tasks take a long time to execute? How can application schedulers be made fault-tolerant? Can Mesos be used for long-running (web) services? EECS 582 – W16