CARDIO: Cost-Aware Replication for Data-Intensive workflOws Presented by Chen He.

CARDIO: Cost-Aware Replication for Data-Intensive workflOws Presented by Chen He

Motivation Is large scale cluster reliable?  5 average worker deaths per Map- Reduce job  At least 1 disk failure in every run of a 6- hour MapReduce job on a 4000-node cluster

Motivation How to prevent node failure from affecting performance? – Replication Capacity constraint Replication time, etc – Regeneration through re-execution Delay program progress Cascaded re-execution

Motivation AVAILABILITY COST All pictures adopted from the Internet

Outline Problem Exploration CARDIO Model Hadoop CARDIO System Evaluation Discussion

Problem Exploration Performance Costs – Replication cost (R) – Regeneration cost (G) – Reliability cost (Z) – Execution cost (A) – Total cost (T) – Disk cost (Y) T=A+Z Z=R+G

Problem Exploration Experiment Environment – Hadoop 0.20.2 – 25 VMs – Workloads: Tagger->Join->Grep->RecordCounter

Problem Exploration Summary Replication Factor for MR Stages

Problem Exploration Summary Detailed Execution Time of 3 Cases

CARDIO Model Block Failure Model – Output of stage i is – Replication factor is – Total block number is – Single block failure probability is – Failure probability in stage i:

CARDIO Model Cost Computation Model – Total time of stage i: – Replication cost of stage i: – Expected regeneration time of stage i: – Reliability cost for all stages: – Storage Constraint C of all stages: – Choose to minimize Z

CARDIO Model Dynamic Replication – Replication number x may vary during the program approaching Job is in Step k, the replication factor at this step is:

CARDIO Model Model for Reliability – Minimize – Based on – In the condition of

CARDIO Model Resource Utilization Model – Model Cost = resource utilized – Resource type Q CPU, Network, Disk, and Storage resource, etc. Utilization of q resource in stage i: Normalize usage by Relative costs weights:

CARDIO Model Resource Utilization Model – The cost for A is: – Total Cost: – Optimization target: Choose to minimize T

CARDIO Model Optimization Problem – Job optimality (JO) – Stage optimality (SO)

Hadoop CARDIO System CardioSense – Obtain progress from JT periodically – Be triggered by pre-configured threshold-value – Collect resource usage statistics for running stages – Rely on HMon on each worker node HMon based on Atop has low overhead

Hadoop CARDIO System CardioSolve – Receive data from CardioSense – Solve SO problem – Decide the replication factors for current and previous stages

Hadoop CARDIO System CardioAct – Implement the command from CardioSolve – Use HDFS API setReplication(file, replicaNumber)

Hadoop CARDIO System

Evaluation Several Important Parameters – p is the failure rate 0.2 if not specified – is the time to replicate a data unit, 0.2 as well – is the computation resource of stage i, it follows uniform distribution U(1,Cmax),Cmax=100 in general. – is the output of stage i, it is obtained from a uniform distribution U(1, Dmax), Dmax varies within the [1,Cmax]. – C is the storage constraint for the whole process. Default value is

Evaluation Effect of Dmax

Evaluation Effect of Failure rate p

Evaluation Effect of block size

Evaluation Effect of different resource constraints ++ means over-utilzed, and this type of resource is regarded as expensive P=0.08, C=204GB, delta=0.6 S3 is CPU intensive DSK has similar performance pattern as NET CPU 0010, NET 0011, DSKIO 0011,STG0011

Evaluation S2 re-execute more frequently due to the failure injection. Because it has large data output. P=0.02, 0.08 and 0.1 1, 3, 21 API reason

Discussion Problems – Typos and misleading symbols – HDFS API setReplication() Any other ideas?

CARDIO: Cost-Aware Replication for Data-Intensive workflOws Presented by Chen He.

Similar presentations

Presentation on theme: "CARDIO: Cost-Aware Replication for Data-Intensive workflOws Presented by Chen He."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CARDIO: Cost-Aware Replication for Data-Intensive workflOws Presented by Chen He.

Similar presentations

Presentation on theme: "CARDIO: Cost-Aware Replication for Data-Intensive workflOws Presented by Chen He."— Presentation transcript:

Similar presentations

About project

Feedback