Download presentation
Published byMariah Taylor Modified over 9 years ago
1
A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andrew Konwinski, Matei Zaharia, Ali Ghodsi , Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica. Presented by Arka Bhattacharya Slides adapted from NSDI 2011 talk
2
Background Rapid innovation in cluster computing frameworks Pregel
Dryad Pig percolator : incremental web-indexing system. MPI Percolator
3
Problem Rapid innovation in cluster computing frameworks
No single framework for all applications. Want to run multiple frameworks in a single cluster …to maximize utilization …to share data between frameworks use same dataset ?
4
This paper is basically about :
Today: static partitioning Mesos: dynamic sharing Hadoop Pregel static partitioning. peaks in demand at different points in time.good statistical multiplexing. adv: jobs will finish faster because they can scale up. you can run more workload on the nodes you already have. Shared cluster MPI
5
Other Benefits of Mesos
Run multiple instances of the same framework Isolate production and experimental jobs Run multiple versions of the framework concurrently deploy upgrades ? Build specialized frameworks to experiment with other programming models, etc Simplify cluster management?
6
Solution Mesos is a common resource sharing layer over which diverse frameworks can run Hadoop Pregel Hadoop Pregel … Mesos … Node Node Node Node Node Node Node Node
7
Mesos Goals High utilization of resources Support diverse frameworks Scalability to 10,000’s of nodes Reliability in face of failures mesos code is small and simple majority scheduling is left to the frameworks. Resulting design: Small microkernel-like core that pushes scheduling logic to frameworks
8
Design Elements Fine-grained sharing: Resource offers:
Allocation at the level of tasks within a job Improves utilization, latency, and data locality Resource offers: Simple, scalable application-controlled scheduling mechanism
9
Element 1: Fine-Grained Sharing
Coarse-Grained Sharing (HPC): Fine-Grained Sharing (Mesos): Fw. 3 Fw. 3 Fw. 2 Fw. 1 Framework 1 Fw. 1 Fw. 2 Fw. 2 Storage System (e.g. HDFS) Storage System (e.g. HDFS) Fw. 2 Fw. 1 Fw. 1 Fw. 3 Framework 2 Fw. 3 Fw. 3 Fw. 2 share nodes in time and space. Fw. 2 Fw. 3 Fw. 2 Framework 3 Fw. 1 Fw. 2 Fw. 1 Fw. 3 + Improved utilization, responsiveness, data locality
10
Element 2: Resource Offers
Option: Global scheduler Frameworks express needs in a specification language, global scheduler matches them to resources + Can (potentially) make optimal decisions if you have all the information – Complex: language must support all framework needs – Difficult to be robust Future frameworks may have unanticipated needs.
11
Element 2: Resource Offers
Mesos: Resource offers Offer available resources to frameworks, let them pick which resources to use and which tasks to launch Keep Mesos simple, make it possible to support future frameworks Decentralized decisions might not be optimal A simple example of a resource offer : Airplane seat reservation.
12
Pick framework to offer resources to
Mesos Architecture MPI job Hadoop job MPI scheduler Hadoop scheduler Pick framework to offer resources to Mesos master Allocation module Resource offer Mesos slave Mesos slave MPI executor MPI executor task task
13
Pick framework to offer resources to
Mesos Architecture MPI job Hadoop job MPI scheduler Hadoop scheduler Resource offer = list of (node, availableResources) E.g. { (node1, <2 CPUs, 4 GB>), (node2, <3 CPUs, 2 GB>) } Pick framework to offer resources to Mesos master Allocation module Resource offer Mesos slave Mesos slave MPI executor MPI executor task task
14
Mesos Architecture MPI job Hadoop job MPI scheduler Hadoop scheduler
Framework-specific scheduling task Pick framework to offer resources to Mesos master Allocation module Resource offer Note about separation of inter-framework and intra-framework How does framework know when to accept and reject offers ? Wait a little ! Mesos slave Mesos slave Launches and isolates executors MPI executor MPI executor Hadoop executor task task
15
Optimization: Filters
Let frameworks short-circuit rejection by providing a predicate on resources to be offered E.g. “nodes from list L” or “nodes with > 8 GB RAM” Could generalize to other hints as well Ability to reject still ensures correctness when needs cannot be expressed using filters
16
Framework Isolation Mesos uses OS isolation mechanisms, such as Linux containers and Solaris projects Containers currently support CPU, memory, IO and network bandwidth isolation Not perfect, but much better than no isolation
17
Results Utilization and performance vs static partitioning
Framework placement goals: data locality Scalability Fault recovery
18
Dynamic Resource Sharing
also note how quickly the frameworks ramp up
19
Mesos vs Static Partitioning
Compared performance with statically partitioned cluster where each framework gets 25% of nodes Framework Speedup on Mesos Facebook Hadoop Mix 1.14× Large Hadoop Mix 2.10× Spark 1.26× Torque / MPI 0.96×
20
Data Locality with Resource Offers
Ran 16 instances of Hadoop on a shared HDFS cluster Used delay scheduling [EuroSys ’10] in Hadoop to get locality (wait a short time to acquire data-local nodes) 1.7×
21
Scalability Mesos only performs inter-framework scheduling (e.g. fair sharing), which is easier than intra-framework scheduling Result: Scaled to 50,000 emulated slaves, 200 frameworks, 100K tasks (30s len)
22
Mesos Now
23
Thoughts Resource offers vs resource requests:
Unnecessary complication ? Was it driven by desire to change frameworks minimally ? Fragmentation and fairness. DRF paper shows that task sizes can be large, hence large wastage. Slightly messy fairness reasoning when resources are rejected by a framework How different are boolean “filters” from resource requests ? Is the two-level scheduling used by current data centers ? Mesosphere does not seem to need it for most of its use-cases.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.