Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick.

Similar presentations

Presentation on theme: "1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick."— Presentation transcript:

1 1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick Kondo, Franck Cappello

2 2/22 Outline Background of Google Cloud Task Processing System Overview Research Formulation Optimization of Fault-tolerance Optimization of the Number of Checkpoints Adaptive Optimization of Fault Tolerance Local disk vs. Shared disk Performance Evaluation Conclusion and Future Work

3 3/22 Background Google trace (released in 2011.11): 670,000 jobs, 2,500,000 tasks, 12,000 nodes One-month period (29 days) Various events, Resource request/allocation, Job/task length, Various attributes, etc. There are two types of jobs in Google trace: sequential-task job and Bag-of-Task job 4000 application types, such as map-reduce. Failure events occur often for some tasks! Most of task lengths are short (a few or dozens of minutes), so task execution is sensitive to checkpointing cost.

4 4/22 System Overview User Interface Receive tasks Task Scheduling Coordinate resource competition among hosts Resource Allocation Coordinate resource usage within a particular host

5 5/22 System Overview (Cont ’ d) Task Processing Procedure

6 6/22 Research Formulation Analysis of Google trace: Task failure intervals, Task length, Job structure Equidistant checkpointing model Checkpointing interval for a particular task is fixed Task execution model (suppose k failures) T w (task) = T e (task)+C(x-1)+Σ k {roll-back-loss}+Σ k {restart-cost} Objective: minimizing E(T w (task)) Random Variable: K (# of task failure events) Compute optimal # of checkpoints for a Google task Task’s wall-clock time Productive time Checkpoint cost Roll-back lossRestart cost Task EntryTask Exit

7 7/22 Theorem 1: x*: the optimal number of checkpointing intervals T e : task execution length (productive length) E(Y): task’s expected # of failures (characterized by MNOF) C: checkpoint cost (time increment per checkpoint) Formula (3): Example: A task’s productive length is 18 seconds, C = 2 sec, expected # of failures = 2 in its execution Optimal # of checkpointing intervals = sqrt(18*2/(2*2))=3 The optimal checkpointing interval = 18/3 = 6 seconds Optimization of the Number of Checkpoints: New formula

8 8/22 Formula (3) does not depend on probability distribution, unlike Young’s formula Young’s formula (proposed in 1977) Optimal checkpoint interval: C: checkpointing cost T f : mean time between failures (MTBF) Conditions: (1) Task failure intervals follows exponential distribution (2) Checkpoint cost C is far smaller than checkpoint interval T c Due to Taylor series and second-order approximation Optimization of the Number of Checkpoints : Discussion

9 9/22 The assumption with exponential distribution makes Young’s formula unsuitable for Google task processing Distribution of Google task failure intervals based on priority Optimization of the Number of Checkpoints : Discussion

10 10/22 Corollary 1: Young’s formula is a special case Two important conditions: Task failure intervals follow exponential distribution Checkpointing cost is small Optimization of the Number of Checkpoints : Discussion

11 11/22 Optimization of the Number of Checkpoints : Discussion Our formula (3) is easier to apply than Young’s formula in practice - Young’s formula depends on MTBF, while MTBF may not be easy to predict precisely Non-asynchronous clocks across hosts Inevitable influence of checkpointing cost Significant delay of failure detection - By contrast, MNOF is easy to record accurately

12 12/22 Adaptive Optimization of Chpt Positions Problem: what if the probability distribution of failure intervals (or failure rates) changes over time? This is possible due to changeable priority …. Objective: To design an adaptive algorithm to dynamically suit the changing failure rates. Question: Will the optimal checkpoint positions change with decreasing remaining workload over time? Solution: We just need to monitor MNOF, regardless of the decreasing remaining workload to process - because of Theorem 2 Kth chpt(K+1)th chpt Opt chpt intervals? Later on means current time

13 13/22 Adaptive Optimization of Fault Tolerance (Cont ’ d) Theorem 2: Optimal # of checkpointing Intervals computed at (k+1)th checkpoint position Optimal # of checkpointing intervals computed at kth checkpoint position

14 14/22 Local disk vs. Shared disk checkpointing Characterization based on BLCR Operation time cost in setting a checkpoint

15 15/22 Performance Evaluation Experimental Setting We build a testbed based on Google trace, in a cluster with hundreds of VM instances running across 16 nodes (16*8 cores, 16*16GB memroy size, XEN4.0, BLCR) We call it GloudSim (Google based cloud simulation system) [under review by HiPC’13] We reproduce Google task execution as close as possible to Google trace, e.g., Task arrivals are based on the trace or some distribution Task’s memory is reproduced via Google trace Task’s failure events are reproduced via Google trace Each job is chosen from among all sample jobs in the trace

16 16/22 Performance Evaluation (Cont ’ d) Experimental Results Job’s Workload-Processing Ratio (WPR) Checkpointing effect with precise prediction (on MNOF and MTBF)

17 17/22 Performance Evaluation (Cont ’ d) Distribution of WPR with diff. C/R formulas a

18 18/22 Performance Evaluation (Cont ’ d) MNOF & MTBF w.r.t. Priority in Google trace MNOF is stable with task lengths, while MTBF is not stable (changing from 179 to 4199 secs)

19 19/22 Performance Evaluation (Cont ’ d) Min/Avg/Max WPR with respect to diff. Priorities Our formula outperforms Young’s formula by 3-10%

20 20/22 Performance Evaluation (Cont ’ d) Wall-clock lengths of 10,000 job execution Conclusion: Job wall-clock lengths are often incremented by 50-100 seconds under Young’s formula than ours.

21 21/22 Performance Evaluation (Cont ’ d) Adaptive Algorithm vs. Static Algorithm

22 22/22 Conclusion and Future Work Selected conclusions: Our formula (3) is better than Young’s formula by 3-10 percent, w.r.t. Google task processing Job wall-clock lengths are incremented by 50-100 seconds under Young’s formula than ours. Worst WPR under dynamic algorithm stays about 0.8, compared to 0.5 under static algorithm. Future work Port our theorems to more cases like MPI over Cloud platforms.

23 23/22 Thanks for your attention!! Contact me at:

Download ppt "1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick."

Similar presentations

Ads by Google