Hawk: Hybrid Datacenter Scheduling

Hawk: Hybrid Datacenter Scheduling
Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, Willy Zwaenepoel Present a new kind of scheduler July 10th, 2015

Introduction: datacenter scheduling
cluster Job 1 … task task scheduler … … Job N … Lets take a look to the scheduling problem In a data center we have a cluster composed of many nodes (typically 10’s thousands) AND we have a set of jobs, normally divided into tasks in order to be able to run in parallel And between these two we have the scheduler (or resource manager) The GOAL of the scheduler is to efficiently assign job tasks to resources/nodes in the cluster We can do this in different ways task task

Introduction: Centralized scheduling
cluster Job 1 … centralized scheduler task task … … Job N … We can have one centralized scheduler that will be in charge of scheduling all of the jobs in all of the cluster We can see that since everything goes through this component , it will have perfect visibility of WHAT is running WHERE and WHEN As a consequence It can place tasks in the best possible way HOWEVER there’s a catch task task

cluster … centralized scheduler Job N … Job 2 Job 1 … … If we have too many incoming jobs It can get overwhelmed And jobs will have to wait in a queue and suffer from head-of-line blocking

cluster … centralized scheduler Good: placement Not so good: scheduling latency Job N … Job 2 Job 1 … …

Introduction: Distributed scheduling
cluster distributed scheduler 1 Job 1 … distributed scheduler 2 Job 2 … … Now what if we try to do it in a distributed way We can have a better scheduling latency The best would be if we have one scheduler per job HOWEVER Typically distributed schedulers have out dated information about the cluster status or EVEN NO INFO at all … distributed scheduler N Job N

Introduction: Distributed scheduling
cluster distributed scheduler 1 Job 1 … Good: scheduling latency Not so good: placement distributed scheduler 2 Job 2 … … Now what if we try to do it in a distributed way We can have the best scheduling latency if we have one scheduler per job BUT THEN they will normally have out dated information about the cluster status or EVEN NO INFO at all … distributed scheduler N Job N

Outline 1) Introduction 2) HAWK hybrid scheduling Rationale Design
3) Evaluation Simulation Real cluster 4) Conclusion Put emphasis on Hawk

Hybrid scheduling centralized scheduler … … distributed scheduler 1 …
cluster centralized scheduler Job 1 … … Job M distributed scheduler 1 Job 2 … … CAN we get the best of both worlds? The answer is yes The previous talk also introduces a hybrid scheduling approach This work was done in parallel and we were not aware of each other … distributed scheduler N Job N

Hawk: Hybrid scheduling
Long jobs  centralized Short jobs  distributed How does Hawk do hybrid scheduling? We let all long jobs to be scheduled with ONE centralized … short… So we talk about long and short jobs, how do we distinguish

Hawk: Hybrid scheduling
centralized scheduler Long job 1 … Long job M Short job 1 distributed scheduler 1 Long/short: estimated execution time vs cut-off Short job 2 distributed scheduler 2 Hawk uses hybrid scheduling. How? Classifying How does this classification is done? Estimated time Why ? (heterogeneity of jobs: different in nature: like having mice and elephants) … … … distributed scheduler N Short job N

Rationale for Hawk Typical production workloads most resources few
Long job 1 few most resources … Long job M Short job 1 many little resources Short job 2 … Short job N

Rationale for Hawk (continued)
Percentage of long jobs Percentage of task-seconds for long jobs Task-seconds EXPLAIN! Occupancy ratio Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012

Rationale for Hawk (continued)
Percentage of long jobs Percentage of task-seconds for long jobs Long jobs: minority but take up most of the resources Task-seconds EXPLAIN! Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012

Hawk: hybrid scheduling
centralized Bulk of resources  good placement Long job 1 Few jobs  reasonable scheduling latency … Short job 1 distributed 1 Few resources  can trade not-so-good placement Latency sensitive  Fast scheduling Put boxesm, queues, etc … … distributed N Short job N

Hawk: hybrid scheduling
centralized Bulk of resources  good placement Long job 1 Few jobs  reasonable scheduling latency BEST OF BOTH WORLDS Good: scheduling latency for short jobs Good: placement for long jobs … Short job 1 distributed 1 Few resources  can trade not-so-good placement Latency sensitive  Fast scheduling Explain PAUSE Next: Hawk distributed scheduling … … distributed N Short job N

Hawk: Distributed scheduling
Sparrow Work-stealing Lets take a look at how Hawk does distributed scheduling, We use a probing technique introduced in Sparrow AND we also add work-stealing

Sparrow Work-stealing So lets take a look at how sparrow does this probing technique

Sparrow distributed scheduler … random reservation (power of two) task
Technique introduced by Sparrow SOSP 2013 … random reservation (power of two)

Sparrow Work-stealing Technique introduced by Sparrow SOSP 2013

Sparrow and high load distributed scheduler Random placement: …
task Technique introduced by Sparrow SOSP 2013 Sparrow by itself is not good for our goals … Random placement: Low likelihood on finding a free node

High load + job heterogeneity  head-of-line blocking
Sparrow and high load distributed scheduler High load + job heterogeneity  head-of-line blocking task Technique introduced by Sparrow SOSP 2013 Sparrow by itself is not good for our goals … Random placement: Low likelihood on finding a free node

Hawk work-stealing Free node!! …
Technique introduced by Sparrow SOSP 2013 …

Hawk work-stealing 2. Random node: send short jobs reservation in queue Technique introduced by Sparrow SOSP 2013 1. Free node: contact random node for probes! …

High load  high probability of contacting node with backlog
Hawk work-stealing 2. Random node: send short jobs reservation in queue High load  high probability of contacting node with backlog PAUSE 1. Free node: contact random node for probes! …

Hawk cluster partitioning
centralized scheduler … No coordination, challenge: no free nodes for mice! Reserved nodes: small cluster partition NO coordination between centralized and distributed distributed scheduler

Hawk cluster partitioning
centralized scheduler Short jobs schedule anywhere. Long jobs only in non-reserved nodes. … No coordination, challenge: no free nodes for mice! Reserved nodes: small cluster partition NO coordination between centralized and distributed distributed scheduler

Hawk design summary Hybrid scheduler:
long  centralized, short  distributed Work-stealing Cluster partitioning

Evaluation: 1. Simulation
Sparrow simulator Google trace Vary number of nodes to vary cluster utilization Measure: Job running time Report 50th and 90th percentiles for short and long jobs Normalized to Spark

Simulated results: short jobs
lower better Better across the board 1 is sparrow Lower is better This is not NOT low latency for short jobs!! This is low waiting time for short jobs… low latency? Distinguish from scheduling latency… We are good wrt Sparrow

Simulated results: long jobs
lower better Better except under high load

Simulated results: long jobs
lower better BECAUSE part of the cluster is reserved for only short jobs Very high utilization: partitioning

Decomposing Hawk Hawk minus centralized Hawk minus stealing
Hawk minus partitioning (normalized to Hawk)

Decomposing Hawk: no centralized
Hawk minus centralized Hawk minus stealing Hawk minus partitioning (normalized to Hawk) NO CENTRALIZED Performance of long jobs goes high because tasks for different jobs queue one after another Short jobs better because long jobs performance decreases, fewer short tasks encounter queueing there

Decomposing Hawk: no stealing
19.6 Hawk minus centralized Hawk minus stealing Hawk minus partitioning (normalized to Hawk) WITHOUT STEALING Short greatly penalized: tasks queued behind long tasks Long slightly penalized because they share queuing with more short tasks

Decomposing Hawk: no partitioning
11.9 Hawk minus centralized Hawk minus stealing Hawk minus partitioning (normalized to Hawk) NO PARTITIONING Short jobs bad, stucked behind long tasks in any node Long jobs slightly better because they can schedule in more nodes

Decomposing Hawk summary
19.6 11.9 Absence of any component reduces Hawk’s performance! NO CENTRALIZED Performance of long jobs goes high because tasks for different jobs queue one after another Short jobs better because long jobs performance decreases, fewer short tasks encounter queueing there WITHOUT STEALING Short greatly penalized: tasks queued behind long tasks Long slightly penalized because they share queuing with more short tasks NO PARTITIONING Short jobs bad, stucked behind long tasks in any node Long jobs slightly better because they can schedule in more nodes

Sensitivity analysis Incorrect estimates of runtime Cutoff long/short
Details of stealing Size of small partition

Sensitivity analysis Incorrect estimates of runtime Cutoff long/short
Details of stealing Size of small partition Bottom line: relatively stable to variations See paper for details

Evaluation: 2. Implementation
Hawk scheduler Hawk daemon Hawk daemon

Experiment 100-node cluster Subset of Google trace
Vary inter-arrival time to vary cluster utilization Measure: Job running time Report 50th and 90th percentile for short and long jobs Normalized to Sparrow Say WHY we compressed

Short jobs lower better Inter-arrival time / mean task run time
90th percentile not so good prediction: fewer jobs tested (corner cases)

Long jobs lower better Inter-arrival time / mean task run time
90th percentile not so good prediction: fewer jobs tested (corner cases)

Implementation 1. Hawk works well in real cluster 2. Good correspondence implementation/simulation

Related work Centralized: Hadoop, Quincy Eurosys’10, SOSP‘09
Two level: Yarn, Mesos SoCC’12, NSDI’11 Distributed schedulers: Omega, Sparrow Eurosys’12,SOSP’13 Hybrid schedulers: Mercury A lot of work in this area tradeoff list a few examples

Conclusion Hawk: hybrid scheduler
long : centralized, short: distributed work-stealing cluster partitioning Hawk provides good results for short and long jobs Even under high cluster utilization

Hawk: Hybrid Datacenter Scheduling

Similar presentations

Presentation on theme: "Hawk: Hybrid Datacenter Scheduling"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hawk: Hybrid Datacenter Scheduling

Similar presentations

Presentation on theme: "Hawk: Hybrid Datacenter Scheduling"— Presentation transcript:

Similar presentations

About project

Feedback