Yousi Zheng Dept. of ECE, The Ohio State University

Name: Yousi Zheng Dept. of ECE, The Ohio State University
Uploaded: 2018-01-13T13:14:47+00:00
Duration: PTM28S46
Channel: Annis Harvey
Description: Yousi Zheng Dept. of ECE, The Ohio State University

A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers
Yousi Zheng Dept. of ECE, The Ohio State University Joint work with Ness B. Shroff Prasun Sinha

MapReduce Data Centers MapReduce
Facility containing a very large number of machines They process very large datasets: MapReduce MapReduce Framework for processing highly parallel problems Developed (popularized) by Google Nearly ubiquitous (Google, IBM, Facebook, Microsoft, Yahoo, Amazon, eBay, twitter…) Used in a variety of different applications Distributed Grep, distributed sort, AI, scientific computation, Large scale pdf generation, Geographical data, Image processing…

MapReduce Map phase: Reduce phase
Takes an input job and divides into many small sub- problems Operations can run in parallel on potentially different machines Reduce phase Combines the output of Map Occurs after the Map phase is completed Runs on parallel machines… Goal: Schedule these Map and Reduce tasks in order to minimize the total flow time in the system

Model Time is slotted N “Machines”: Each machine runs 1 unit of workload per slot Scheduling Constraint: Reduce job starts after all tasks in the Map job are completed. Ri (k) Each Reduce task can have multiple units of workload Ri (k) units of workload for task k for job i Each job i consists of Map tasks and Reduce tasks units of workload for Reduce job i Each Map task has 1 unit of workload Total Mi tasks for Map job i … Data Center with N machines n jobs over T slots Reduce job i 1 Map Mi

Types of Schedulers: Treatment of Reduce Tasks
Machines Preemptive Reduce tasks can be interrupted at the end of a slot Remaining workload in the task can be executed on any machine(s) Reasonable if overhead of data-migration is small Non-preemptive R2 Machine C Machine B R1 R1 Machine A Time

Machines Preemptive Reduce tasks can be interrupted at the end of a slot Remaining workload in the task can be executed on any machine(s) Reasonable if overhead of data-migration is small Non-preemptive Once a Reduce task is started, it can’t be interrupted till the end of the task An individual Reduce task cannot be completed on different machines But different tasks from same job can be assigned to different machines Note: Since Map tasks have unit workload, they cannot be interrupted. In practice Map tasks may be > 1 unit, but small R2 R2 Machine C Machine B R1 R1 Machine A Time

Flow-Time Minimization Problem: Preemptive Scenario
Minimize flow-time Total # machines is N Total map workload is Mi for job i Total reduce workload is Ri for job i Scheduling Constraint

Flow-Time Minimization Problem Non-Preemptive Scenario
Minimize flow-time Total # machines is N Total map workload is Mi for job i Total reduce workload is R(k)i for task k in job i Non-preemptive Both Problems are NP-hard in the strong sense

Competitive Ratio : set of all possible schedulers (including non-causal) : Total Flow time of scheduler (sum of FT of all jobs) Define Competitive ratio: A given online policy S is said to have a competitive ratio of c if for any arrival pattern Theorem For any constant c0 and any online scheduler S, there are sequences of arrivals and workloads in the non-preemptive case, such that the competitive ratio c is greater than c0  No policy gives a finite competitive ratio

Efficiency Ratio Analysis
Efficiency ratio: A given online policy S is said to have an efficiency ratio of  if (over some probability space of arrivals) Theorem If the workload of Reduce tasks of each job is light-tailed distributed, then Any work-conserving scheduler has a constant efficiency ratio, for both preemptive and non-preemptive scenarios.

So Far Preemptive scenario: Yes! (ASRPT)
No scheduler has a finite competitive ratio in non- preemptive scenario An evaluation metric efficiency ratio is introduced All work conserving schedulers have a finite efficiency ratio (g) for both preemptive & non- preemptive scenarios But the performance of these schedulers could still be quite poor (g could be very large) Key Question: Can we design a simple scheduler with a provably small efficiency ratio in the MapReduce paradigm? Preemptive scenario: Yes! (ASRPT)

ASRPT: Available Shortest Remaining Processing Time
SRPT: An infeasible scheduler Priority given to jobs with smallest remaining workload In SRPT, Map and Reduce for a given job can be scheduled in the same time-slot (infeasible in reality) SRPT results in a lower flow time than any feasible (including non-causal) scheduler ASRPT Map tasks are scheduled according to SRPT (or sooner) By running SRPT in a virtual manner Reduce tasks greedily fill up the remaining machines In the order determined by the shortest remaining processing time

ASRPT: Available Shortest Remaining Processing Time
4 4 Theorem: If the job arrivals and workload are i.i.d., then ASRPT has an efficiency ratio of 2 in the preemptive scenario. R1 R1 R1 Number of machines, N Number of machines, N 3 3 R1 2 2 M1 R2 M1 1 1 M2 R1 M2 R2 1 2 3 4 1 2 3 4 Time Time SRPT : Total Flow-time 4 units ASRPT : Total Flow-time 5 units

Simulation N = 100 machines, total time slots T = 800
# Reduce tasks in each job is 10 Job arrival process: Poisson; λ = 2 jobs per time slot. Preemptive Scenario Schedulers ASRPT Fair FIFO ASRPT has smaller efficiency ratio than Fair and FIFO.

Conclusion Investigated the flow time minimization problem in MapReduce schedulers No online policy has a finite competitive ratio in non-preemptive scenarios All work-conserving schedulers have constant efficiency ratios Preemptive and non-preemptive scenarios. A specific algorithm (ASRPT) can guarantee an efficiency ratio of 2 in the preemptive scenario. Is efficiency ratio of ASRPT similarly small in non-pre-emptive case? Efficiency Ratio based analysis provides worse case (w.p.1) guarantees (compared to competitive ratio), but gives more design flexibility. Has the potential to be applied to other problems/domains

Future Work Consider cost of data migration Consider Shuffle Phase
Introduce more information (e.g., dependency Graph) to the scheduler So that all reduce tasks don’t wait until Map tasks are completed Consider cost of data migration Combine preemptive and non-preemptive scenarios Consider multiple phases With phase precedence (for complex processing) Develop Green Energy solutions Combine with sleep-wake scheduling Consider renewable energy fluctuations

Thank You!

Backup Slides

Main Results Introduced a metric Efficiency Ratio to evaluate the performance of schedulers in MapReduce Proved that no online scheduler has a finite competitive ratio Guarantee with probability 1 May have greater modeling implications for analysis of algorithms Proved that Efficiency Ratio is bounded by a constant for any work-conserving scheduler For both bounded and light-tailed workload for Reduce phase For both preemptive and non-preemptive scenarios Developed a specific work-conserving algorithm (ASRPT) that has an efficiency ratio of 2 for preemptive scenario. The contributions of this paper are as follows: • For the problem of minimizing the total delay (flow time) in the MapReduce framework, we show that no online algorithm can achieve a constant competitive ratio. • To directly analyze the total delay in the system, we propose a new metric to measure the performance of schedulers, which we call the efficiency ratio. • Under some weak assumptions, we then show a surprising property that for the flow-time problem any work-conserving scheduler has a constant efficiency ratio in both preemptive and non-preemptive scenarios. • We present an online scheduling algorithm called ASRPT (Available Shortest Remaining Processing Time) with a very small (less than 2) efficiency ratio, and show that it outperforms state-of-the-art schedulers through simulations.

Efficiency Ratio: Preemptive & Non-preemptive
For any work-conserving scheduling policy: Preemptive: Non-preemptive: Similar result holds but with different constants ( ) where p0 = Prob (a slot has no job arrivals) Any work-conserving scheduler has finite efficiency ratio in preemptive scenario. Using hint and SLLN to get the result. We use LRPT and FIFO to show the existence of the bound. LRPT = Longest Remaining Processing Time. Intuitively, LRPT is a bad scheduler.

MapReduce: FIFO scheduler
FCFS scheduler with 4 machines, 2 jobs Map must finish before Reduce can start M2 R2 4 R1 Number of machines 3 Flow-time of a job: time spent by job in the system FT of Job 1: 3 time slots FT of Job 2: 3 time slots Total FT : 6 time slots 2 M1 M2 Let’s look FCFS: Although there are idle machines in time slot 3, R2 cannot be processed in time slot 3, because its corresponding Map function M2 is not finished yet at the beginning of this time slot. The quick example is in the preemptive scenario (which is defined later). Also, this one can be viewed as 1 R1 R2 1 2 3 4 Time

MapReduce: Smarter Scheduler
4 R1 Number of machines 3 Flow-time: time in the system FT of Job 1: 3 time slots FT of Job 2: 2 time slots Total FT: time slots R2 2 M1 R1 R2 can be processed in time slot 3, because its corresponding Map function M2 is finished at the beginning of this time slot. 1 M2 1 2 3 4 Time Goal: Minimize total flow-time

Future Work Consider cost of data migration Consider Shuffle Phase
Combine preemptive and non-preemptive scenarios Consider multiple phases With phase precedence (for complex processing) Develop Green Energy solutions Combine with sleep-wake scheduling Consider renewable energy fluctuations Consider Shuffle Phase Introduce more information (e.g., dependency Graph) to the scheduler So that all reduce tasks don’t wait until Map tasks are completed

Analysis Outline Analysis applies to:
Divide up time into repeating intervals Interval corresponds to the consecutive times slots until a time-slot occurs with an idle machine(s) Bound the moments of these intervals with constants Bound the efficiency ratio using the moments Analysis applies to: Preemptive and non-preemptive scenarios Allows jobs to have infinite but light-tailed support

Preemptive Scenario: An Example
Complete time-slot (all machines used) Incomplete time-slot (all machines not used) Kj: Length of j-th interval (ends in incomplete slot) Observations for a job arriving in interval j (work-conserving) Map task finishes in interval j (otherwise last time-slot would be complete) Reduce tasks finish in interval j or interval j+1  Flow time per job is bounded, wp1, if the interval sizes (moments) are bounded  g is bounded wp1. Example, show intervals K_1, K_2 and K_3. K_1 = time slot 1 K_2 = time slot 2 3 K_3 = time slot 4 5 6

Non-preemptive Scenario: An Example
Observation for reduce tasks: Unlike preemptive scenario, in the non- preemptive scenario, each job arriving in interval j will start (not finish) all its reduce tasks in interval j or interval j+1 Reduce tasks must be finished by the end of interval j+1 plus time Ri-1. Hence, if (all moments) of the interval are bounded then the flow times (per job) are bounded wp1  g is bounded wp1. Show the hint: e.g., job $i$, Phase 1 is finished in $K_j$, Phase 2 is started in $K_j+K_{j+1}$, total job is finished in $K_j+K_{j+1}+R_i-1$.

Bounds on the Moments Lemma: If the scheduler is work-conserving, then for any given number , there exists a constant , such that , for all j. where The length of interval has finite moments for both Pre-emptive and Non-Preemptive schedulers We only use the bounds for the first and the second moment. Rough idea: Since the traffic intensity ρ<1, and the workload in each slot is light-tailed, the number of consecutive complete time slots will decay exponentially (Chernoff’s Theorem)  The moments of interval K are bounded

Many unanswered questions
Competitive Ratio Efficiency Ratio Preemptive ? Constant for any work-conserving scheduler 2 for ASRPT Non-preemptive No constant for any scheduler ? for ASRPT

Main schedulers proposed for MapReduce framework
FIFO scheduler: Default in Hadoop Suffers from the well known head-of-line blocking problem Fair scheduler [Zaharia’09] Mitigate head-of-line blocking problem of FIFO Jobs stick to the machines on which they are initially scheduled Coupling scheduler [Tan’12] Mitigate the problem that jobs stick to machines Map Reduce is Inspired by the Map and Reduce phases commonly used in functional programming although their purpose is not the same as their original forms. Coupling scheduler is based on Fair scheduler. With speed-up, when speed-up -> 0, the competitive ratio -> $\infty$. Minimizing total completion time = minimizing total flow time. Competitive ratio of “minimizing total completion time” != competitive ratio of “minimizing total flow time”.

Main schedulers proposed for MapReduce framework
Consider machines with speed-up [Moseley’11] O(1/ ε5) competitive algorithm with (1 + ε) speed for the online case, where 0 < ε ≤ 1 Speed-up of machines is necessary in the algorithm (if ε decreases to 0, the competitive ratio will increase to infinity correspondingly). Consider minimizing total completion time [Chang’11, Chen’12] Minimizing total completion time = minimizing total flow time. Competitive ratio of “minimizing total completion time” ≠ competitive ratio of “minimizing total flow time”. Minimizing total completion time = minimizing total flow time. Competitive ratio of “minimizing total completion time” != competitive ratio of “minimizing total flow time”.

ASRPT ASRPT runs SRPT in a virtual fashion and keeps track of how SRPT would have scheduled jobs in any given slot. ASRPT assigns machines to the tasks in the following priority order (from high to low): (Only in the non-preemptive scenario) The previously scheduled Reduce tasks which are not yet finished, The Map tasks which are scheduled in the SRPT, The available Reduce tasks, The available Map tasks. For each priority group, the algorithm greedily assign machines to the corresponding tasks through the sorted available workload.

Light-tailed Distributed Workload
The expression of B_1’ and B_2’. where

Simulations Preemptive Non-preemptive
Choose uniform distribution and exponential distribution as examples of bounded workload and light-tailed distributed workload, respectively. For short, we use Exp(μ) to represent an exponential distribution with mean μ, and use U[a, b] to represent a uniform distribution on {a, a + 1, ..., b − 1, b}. In the simulations, the efficiency ratio of a scheduler is obtained by the total flow time of the scheduler over the lower bound of the total flow time in T time slots. Thus, the real efficiency ratio should be smaller than the efficiency ratio given in the simulations. However, the proportion of all schedulers would remain the same.

Main Results Competitive Ratio: Deterministic guarantee under adversarial arrivals Unbounded for all non-preemptive scheduler Efficiency Ratio Asymptotic Guarantee with probability 1 May have greater modeling implications for analysis of algorithms Efficiency Ratio is bounded by a constant for any work-conserving scheduler For bounded workload for Reduce phase (Ri < Rmax) For light-tailed workload for Reduce phase An Algorithm (ASRPT) with efficiency ratio of 2 for preemptive scenario. Competitive Ratio (worst case) Efficiency Ratio (wp1) (some people talk about similar definitions, e.g., Probabilistic Competitive Analysis of a Dial-a-Ride Problem on Trees under High Load, 2005, by Benjamin Hiller, High Load and Benjamin Hiller).

Non-preemptive One task can only be allocated to one machine (preserves data locality) Once started cannot be interrupted till the end of the task One job contains several tasks. In non-preemptive scenario, each task cannot interrupted, and cannot run in parallel. In the light blue and dark blue reduce tasks, they are the same task so they have to be run on the same machine and cannot be scheduled in time-slot 2 even though there are empty machines. In the yellow and red case, there are multiple tasks in each job, so each task of a job can be scheduled on a different machine E.g., Job 3 (yellow color) may contain 4 tasks. In time slot 3, Job 3 arrives. In time slot 4, the four tasks are allocated, and 2 of them are finished. In time slot 5, the remaining 2 tasks are finished. The Y-axis is the number of allocated machines, not represent the position of machines. E.g., the Reduce tasks in time slot 5 and 6 (Red color) are allocated in the same machines.

Preemptive Every task can be interrupted at the end of a slot Remaining workload can be executed on any machine Preemptive or Non-preemptive depends on applications. E.g., if the overhead of data migration is too large, it can be viewed as non-preemptive; otherwise, preemptive.

More in Practice Shuffle Phase
1 Shuffle Phase can let some Reduce tasks begin to work, even if not all Map tasks are finished 2 Shuffle Phase can be viewed as a part, who knows the Dependency Graph between Map and Reduce tasks 3 Shuffle Phase can mitigate network traffic burst in some scenarios 4 If the Dependency Graph is a complete graph, then the model remains the same Shuffle Phase

ASRPT: Available Shortest Remaining Processing Time [Example]
4 4 R1 R1 R1 Number of machines, N Number of machines, N 3 3 R2 2 2 M1 M1 R1 Example of SRPT and ASRPT. 1 1 M2 R2 M2 1 2 3 4 1 2 3 4 Time Time SRPT : Total Flow-time 4 units ASRPT : Total Flow-time 5 units

ASRPT: Available Shortest Remaining Processing Time [Example]
4 4 R1 R1 Number of machines, N Number of machines, N 3 3 R2 2 2 M1 M2 M1 R1 Example of FIFO and ASRPT. 1 1 R1 R2 M2 1 2 3 4 1 2 3 4 Time Time FIFO : Total Flow-time 6 units ASRPT : Total Flow-time 5 units

Bounds on the Moments Lemma: If the scheduler is work-conserving, then for any given number , there exists a constant , such that , for all j. where The length of interval has finite moments for both Pre-emptive and Non-Premptive schedulers We only use the bounds for the first and the second moment.

Yousi Zheng Dept. of ECE, The Ohio State University

Similar presentations

Presentation on theme: "Yousi Zheng Dept. of ECE, The Ohio State University "— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yousi Zheng Dept. of ECE, The Ohio State University

Similar presentations

Presentation on theme: "Yousi Zheng Dept. of ECE, The Ohio State University "— Presentation transcript:

Similar presentations

About project

Feedback