Download presentation
Presentation is loading. Please wait.
Published byHeaven Hinckley Modified over 10 years ago
1
GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu
2
Approximation Jobs Data deluge timely answers from part of dataset are often good enough Deadline-bound Jobs: Maximize accuracy within a deadline – E.g., get me the best ad to show within 1s Error-bound Jobs: Minimize time taken to achieve desired accuracy – E.g., get me the #cars sold to the nearest 1000
3
Prioritize Tasks Spawn jobs on large dataset; subset of tasks need to complete – Accuracy α (Subset of completed tasks) Prioritize tasks – Jobs run only part of their tasks at a time (multi- waved) NP-Hard but many known heuristics…
4
Stragglers Occur despite prevention mechanisms Speculation: Spawn a duplicate, earliest wins Challenge: dynamically prioritize while choosing between speculative/unscheduled tasks
5
Speculative copies consume extra resources T3 Opportunity Cost T2 time Slot 1 Slot 2 Slot 3 T1 5 0 10 9 Is speculation worth the payoff? T1
6
Roadmap Two extreme approaches Switching between the two approaches Evaluation
7
Greedy Scheduling (GS) Schedule the task that greedily improves accuracy, i.e., finish earliest T1 T2 T3 time Slot 1 Slot 2 1 03 T4 T5 T6 6 T7 Task IDT1T2T3T4T5T6T7T8T9 t rem 5-------- t new 2---1111113 T1 Deadline = 6 Accuracy = 7/9
8
Resource Aware Scheduling (RAS) Speculate only if (c)t rem > (c+1)t new time T1 T2 T1 Slot 1 Slot 2 1 0 T6 T3 T4 T5 3 6 T7 T8 Task IDT1T2T3T4T5T6T7T8T9 t rem 5-------- t new 2---1111113 T1 Deadline = 6 Accuracy = 8/9
9
GS vs. RAS T1 T2 T3 time Slot 1 Slot 2 1 03 T4 T5 T6 6 T7 T1 Deadline = 6 Accuracy = 7/9 time T1 T2 T1 Slot 1 Slot 2 T6 T3 T4 T5 T7 T8 T1 Deadline = 6 Accuracy = 8/9 Deadline = 3 Accuracy = 3/9 T1 Accuracy = 2/9 GS RAS
10
How to pick between GS and RAS? When to consider opportunity cost? Analytical model [TBD]
11
Guideline Be “conservative” early in the job, and “aggressive” near its completion For jobs with >2 waves of tasks use RAS, otherwise use GS
12
GRASS = GS + RAS Start with RAS and switch to GS when 2 waves of tasks remain How to estimate two remaining waves? – Wave boundaries are not strict – Task durations are uncertain – Wave-width is not constant
13
Learning switching point in GRASS GRASS starts with RAS and switches to GS close to the deadline/error-bound Learn the switching point from completed jobs – Generate GS-only and RAS-only samples Switch when the GS-only samples are better for the remaining deadline
14
GRASS Scheduling GS Greedy Scheduling RAS Opportunity cost aware Scheduling Switch from RAS to GS close to deadline/error- bound Learn switching point
15
Implementation Details Implemented inside Hadoop 0.20.2 and Spark 0.7.3 – Modified Fair Scheduler Task Estimators – t rem is extrapolated from remaining data; progress reports at 5% intervals – t new is estimated from completed tasks
16
Evaluating GRASS Workload from Facebook and Bing traces – 1000’s of nodes, Hadoop and Dryad jobs – How was it generated? Baselines: LATE or Mantri 200 node EC2 deployment; trace-driven sim.
17
Deadline-bound Jobs
18
Error-bound Jobs
19
Value of Switching
20
Related Work
21
Conclusion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.