Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica Presented by Youngmoon Lee EECS 582 – W1613/14/16
Agenda 1.Introduction 2.Problem statement 3.Design 4.Results 5.Discussion 6.Real world Mesos 3/14/16EECS 582 – W162
A cluster manager that provides resource sharing and isolation across cluster frameworks, Hadoop, Spark, MPI. Datacenter OS, even shorter 3/14/16EECS 582 – W163
Background [2009] A Berkeley view of Cloud Computing [2009] Nexus: Common substrate for Cloud [2011] Mesos: Fine-grained resource sharing [2011] DRF: Dominant Resource Fairness [2013] YARN [2013] Omega [2015] Borg 3/14/16EECS 582 – W164
Introduction Want to run different frameworks in a single cluster Static partitioning: No sharing 3/14/16EECS 582 – W165
Problem Required resource is different for frameworks: under-utilization 3/14/16EECS 582 – W166 time Static assignment
Problem Required resource is different for frameworks: under-utilization 3/14/16EECS 582 – W167 time
Solution 3/14/16EECS 582 – W168
Solution 3/14/16EECS 582 – W169
Objective Dynamic sharing and management of resources Utilization( and scalability 3/14/16EECS 582 – W1610
Design Micro-kernel pushes scheduling logic to frameworks “Minimal resource multiplexer (two-level)” 3/14/16EECS 582 – W1611 Utilization! Scalability! Task-level Fine-grained sharing Resource offers “Let framework pick” Performance Isolation? Containers Fairness? Max-min fairness[DRF} Revoke resource Objectives Concern Implementation Design
Resource offer 3/14/16EECS 582 – W1612 Free!
Resource offer 3/14/16EECS 582 – W1613 Suit yourself
Resource offer 3/14/16EECS 582 – W1614
Resource offer 3/14/16EECS 582 – W1615
Resource offer 3/14/16EECS 582 – W1616 Suit yourself 5 So harmonious, yet…
Hypothesis Tasks are short-lived returning resources frequently All long running, no resource return, No sharing? Job sizes are small compared to the size of cluster 3/14/16EECS 582 – W1617 Most of time 0 CPU remains
Results 3/14/16EECS 582 – W1618 vs Static resource sharing
CPU/Memory Utilization CPU 10%, Memory 17% Improves 3/14/16EECS 582 – W1619
Resource offer Hadoop finds better data-locality with resource offers 3/14/16EECS 582 – W1620
Scalability Simple inter-framework scheduling micro-kernel 3/14/16EECS 582 – W1621 Note: 10 s task 10% 8% 6% 4% 2% 0% Number of Nodes
Discussion Max-min fairness allocation? Framework Starvation? Malicious Framework? Lottery/stride scheduling? Omega? 3/14/16EECS 582 – W CPU CPU
Discussion MPI becomes slower, how to handle resource contention? MPI interdependency affects performance? 3/14/16EECS 582 – W s ∆=112 s
Discussion Overhead of 8% for 50K nodes? For 10% utilization gain? 10 s tasks takes 11 s start time increases with resource requirements? 3/14/16EECS 582 – W1624 Note: 10 s task 10% 8% 6% 4% 2% 0%
3/14/16EECS 582 – W1625 enterprise Long running service Fault tolerant Cron-like system PaaS
3/14/16EECS 582 – W1626 Enterprise Consulting Long running service Fault tolerant Cron-like system PaaS
3/14/16EECS 582 – W1627 “Datacentre OS”
Original motivation Originally, Mesos built to run different version of Hadoop If it’s useful, also can be useful for many things 3/14/16EECS 582 – W1628 v1 v2 v3
Thank you 3/14/16EECS 582 – W1629
3/14/16EECS 582 – W1630
Resource offers 3/14/16EECS 582 – W1631
Resource offers 3/14/16EECS 582 – W1632
Resource offers 3/14/16EECS 582 – W1633