Download presentation
Presentation is loading. Please wait.
Published byMitchell McCormick Modified over 9 years ago
1
Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca
2
2 DATA PARALLEL CLUSTERS
3
3
4
4 Predictabi lity
5
5 DATA PARALLEL CLUSTERS Deadli ne
6
6 DATA PARALLEL CLUSTERS Deadli ne
7
7 VARIABLE LATENCY
8
8
9
9
10
10 VARIABLE LATENCY
11
11 VARIABLE LATENCY
12
12 VARIABLE LATENCY
13
13 VARIABLE LATENCY
14
14 VARIABLE LATENCY 4.3x
15
15 Why does latency vary? 1.Pipeline complexity 2.Noisy execution environment
16
Cosmos 16 MICROSOFT’S DATA PARALLEL CLUSTERS
17
Cosmos 17 MICROSOFT’S DATA PARALLEL CLUSTERS CosmosStore
18
Cosmos 18 MICROSOFT’S DATA PARALLEL CLUSTERS CosmosStore Dryad
19
Cosmos 19 MICROSOFT’S DATA PARALLEL CLUSTERS CosmosStore Dryad SCOPE
20
Cosmos 20 MICROSOFT’S DATA PARALLEL CLUSTERS CosmosStore Dryad SCOPE
21
21 DRYAD’S DAG WORKFLOW Cosmos Cluster
22
22 DRYAD’S DAG WORKFLOW Cosmos Cluster
23
23 DRYAD’S DAG WORKFLOW Pipeli ne Job
24
24 DRYAD’S DAG WORKFLOW Deadli ne
25
25 DRYAD’S DAG WORKFLOW Deadli ne
26
26 Sta ge DRYAD’S DAG WORKFLOW Job
27
27 Sta ge DRYAD’S DAG WORKFLOW Tas ks Job
28
28
29
29 EXPRESSING PERFORMANCE TARGETS Priorities?
30
30 EXPRESSING PERFORMANCE TARGETS Priorities? Not expressive enough Weights?
31
31 EXPRESSING PERFORMANCE TARGETS Priorities? Not expressive enough Weights? Difficult for users to set Utility curves?
32
32 EXPRESSING PERFORMANCE TARGETS Priorities? Not expressive enough Weights? Difficult for users to set Utility curves? Capture deadline & penalty
33
33 OUR GOAL
34
34 OUR GOAL Maximize utility
35
35 OUR GOAL Maximize utility while minimizing resources
36
36 OUR GOAL Maximize utility while minimizing resources by dynamically adjusting the allocation
37
Jockey 37
38
Jockey 38 Large clusters
39
Jockey 39 Large clusters Many users
40
Jockey 40 Large clusters Many users Prior execution
41
41 JOCKEY – MODEL f(job state, allocation) -> remaining run time
42
42 JOCKEY – MODEL f(job state, allocation) -> remaining run time
43
43 JOCKEY – MODEL f(job state, allocation) -> remaining run time
44
44 JOCKEY – MODEL f(job state, allocation) -> remaining run time
45
45 JOCKEY – CONTROL LOOP
46
46 JOCKEY – CONTROL LOOP
47
47 JOCKEY – CONTROL LOOP
48
48 JOCKEY – MODEL f(job state, allocation) -> remaining run time
49
49 JOCKEY – MODEL f(progress, allocation) -> remaining run time f(job state, allocation) -> remaining run time
50
50 JOCKEY – PROGRESS INDICATOR
51
51 JOCKEY – PROGRESS INDICATOR
52
52 JOCKEY – PROGRESS INDICATOR
53
53 JOCKEY – PROGRESS INDICATOR total running
54
54 JOCKEY – PROGRESS INDICATOR total running + total queuing
55
55 JOCKEY – PROGRESS INDICATOR sta ge total running + total queuing
56
56 JOCKEY – PROGRESS INDICATOR total running + total queuing total running + total queuing total running + total queuing Stage 1 Stage 2 Stage 3 + +
57
57 JOCKEY – PROGRESS INDICATOR total running + total queuing total running + total queuing total running + total queuing # complet e total tasks # complet e total tasks # complet e total tasks Stage 1 Stage 2 Stage 3 + +
58
58 JOCKEY – PROGRESS INDICATOR
59
59 JOCKEY – PROGRESS INDICATOR
60
60 JOCKEY – PROGRESS INDICATOR
61
61 JOCKEY – PROGRESS INDICATOR
62
62 JOCKEY – PROGRESS INDICATOR
63
63 JOCKEY – PROGRESS INDICATOR
64
64 JOCKEY – CONTROL LOOP
65
65 JOCKEY – CONTROL LOOP 1% complete 2% complete 3% complete 4% complete 5% complete Job model
66
66 JOCKEY – CONTROL LOOP 10 nodes20 nodes30 nodes 1% complete 2% complete 3% complete 4% complete 5% complete Job model
67
67 JOCKEY – CONTROL LOOP 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes Job model
68
68 JOCKEY – CONTROL LOOP 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes Job model Deadlin e: 50 min. Comple tion: 1%
69
69 JOCKEY – CONTROL LOOP Job model Deadlin e: 50 min. Comple tion: 1% 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes
70
70 JOCKEY – CONTROL LOOP Job model 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes Deadlin e: 40 min. Comple tion: 3%
71
71 JOCKEY – CONTROL LOOP Job model 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes Deadlin e: 30 min. Comple tion: 5%5%
72
72 JOCKEY – MODEL f(progress, allocation) -> remaining run time
73
73 JOCKEY – MODEL f(progress, allocation) -> remaining run time analytic model?
74
74 JOCKEY – MODEL f(progress, allocation) -> remaining run time analytic model? machine learning?
75
75 JOCKEY – MODEL f(progress, allocation) -> remaining run time analytic model? machine learning? simulator
76
76 JOCKEY ProblemSolution
77
77 JOCKEY ProblemSolution Pipeline complexity
78
78 JOCKEY ProblemSolution Pipeline complexity Use a simulator
79
79 JOCKEY ProblemSolution Pipeline complexity Use a simulator Noisy environment
80
80 JOCKEY ProblemSolution Pipeline complexity Use a simulator Noisy environment Dynamic control
81
Jockey in Action 81
82
Jockey in Action 82 Real job
83
Jockey in Action 83 Real job Production cluster
84
Jockey in Action 84 Real job Production cluster CPU load: ~80%
85
Jockey in Action 85
86
Jockey in Action 86
87
Jockey in Action 87
88
Jockey in Action 88 Initial deadline: 140 minutes
89
Jockey in Action 89 New deadline: 70 minutes
90
Jockey in Action 90 New deadline: 70 minutes Release resources due to excess pessimism
91
Jockey in Action 91 “Oracle” allocation: Total allocation- hours Deadline
92
Jockey in Action 92 “Oracle” allocation: Total allocation- hours Deadline Available parallelism less than allocation
93
Jockey in Action 93 “Oracle” allocation: Total allocation- hours Deadline Allocation above oracle
94
Evaluation 94
95
Evaluation 95 Production cluster
96
Evaluation 96 Production cluster 21 jobs
97
Evaluation 97 Production cluster 21 jobs SLO met?
98
Evaluation 98 Production cluster 21 jobs SLO met? Cluster impact?
99
Evaluation 99
100
Evaluation Jobs which met the SLO 100
101
Evaluation Jobs which met the SLO 101
102
Evaluation Jobs which met the SLO 102 Missed 1 of 94 deadlines
103
Evaluation Jobs which met the SLO 103
104
Evaluation Jobs which met the SLO 104
105
Evaluation Jobs which met the SLO 105 1.4x
106
Evaluation Jobs which met the SLO 106 Allocated too many resources Missed 1 of 94 deadlines
107
Evaluation Jobs which met the SLO Allocated too many resources 107 Simulator made good predictions: 80% finish before deadline Missed 1 of 94 deadlines
108
Evaluation Jobs which met the SLO Allocated too many resources Simulator made good predictions: 80% finish before deadline 108 Control loop is stable and successful Missed 1 of 94 deadlines
109
Evaluation 109
110
Evaluation 110
111
Evaluation 111
112
Evaluation 112
113
Evaluation 113
114
Evaluation 114
115
Conclusion 115
116
116 Data parallel jobs are complex,
117
117 Data parallel jobs are complex, yet users demand deadlines.
118
118 Data parallel jobs are complex, yet users demand deadlines. Jobs run in shared, noisy clusters,
119
119 Data parallel jobs are complex, yet users demand deadlines. Jobs run in shared, noisy clusters, making simple models inaccurate.
120
Jockey 120
121
simulator 121
122
control-loop 122
123
123 Deadli ne
124
124
125
Questions? Andrew Ferguson adf@cs.brown.edu 125
126
Co- authors Peter Bodík (Microsoft Research) Srikanth Kandula (Microsoft Research) Eric Boutín (Microsoft) Rodrigo Fonseca (Brown) Questions? 126 Andrew Ferguson adf@cs.brown.edu
127
Backup Slides 127
128
Utility Curves Deadline For single jobs, scale doesn’t matter For multiple jobs, use financial penalties 128
129
129 Jockey Resource allocation control loop 1. Slack 2. Hysteresis 3. Dead Zone PredictionRun Time Utility
130
130 Cosmos Resources are allocated with a form of fair sharing across business groups and their jobs. (Like Hadoop FairScheduler or CapacityScheduler) Each job is guaranteed a number of tokens as dictated by cluster policy; each running or initializing task uses one token. Token released on task completion. A token is a guaranteed share of CPU and memory To increase efficiency, unused tokens are re-allocated to jobs with available work Resource sharing in Cosmos
131
131 Jockey Progress indicator Can use many features of the job to build a progress indicator Earlier work (ParaTimer) concentrated on fraction of tasks completed Our indicator is very simple, but we found it performs best for Jockey’s needs Total vertex initialization time Total vertex run time Fraction of completed vertices
132
132 Comparison with ARIA ARIA uses analytic models Designed for 3 stages: Map, Shuffle, Reduce Jockey’s control loop is robust due to control-theory improvements ARIA tested on small (66-node) cluster without a network bottleneck We believe Jockey is a better match for production DAG frameworks such as Hive, Pig, etc.
133
133 Jockey Latency prediction: C(p, a) Event-based simulator –Same scheduling logic as actual Job Manager –Captures important features of job progress –Does not model input size variation or speculative re- execution of stragglers –Inputs: job algebra, distributions of task timings, probabilities of failures, allocation Analytic model –Inspired by Amdahl’s Law: T = S + P/N –S is remaining work on critical path, P is all remaining work, N is number of machines
134
134 Jockey Resource allocation control loop Executes in Dryad’s Job Manager Inputs: fraction of completed tasks in each stage, time job has spent running, utility function, precomputed values (for speedup) Output: Number of tokens to allocate Improved with techniques from control-theory
135
Jockey offlineduring job runtime job profile 135
136
Jockey simulator offlineduring job runtime job profile 136
137
Jockey simulator offlineduring job runtime job stats job profile 137
138
Jockey simulator offlineduring job runtime job stats latency predictions job profile 138
139
Jockey simulator offlineduring job runtime utility function job stats latency predictions job profile 139
140
Jockey simulator offlineduring job runtime running job utility function job stats latency predictions resource allocation control loop job profile 140
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.