1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.

1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor

2 Graph analytics Emergence of large graphs –The web, social networks, spatial networks, … Increasing demand of querying large graphs –PageRank, reverse web link analysis over the web graph –Influence analysis in social networks –Traffic analysis, route recommendation over spatial graphs

3 Distributed graph processing MapReduce- like systems Pregel-like systemsGraphLab-related systems Others

4 Failures of compute nodes Increasing graph size More compute nodes Increase in the number of failed nodes Failure rate –# of failures per unit of time –1/200(hours) Exponential failure probability

5 Outline Motivation & background Failure recovery problem –Challenging issues –Existing solutions Solution –Reassignment generation –In-parallel recomputation –Workload rebalance Experimental results Conclusions

6 Pregel-like distributed graph processing systems Graph model –G=(V,E) –P: partitions Computation model –A set of supersteps –Invoke compute function for each active vertex –Each vertex can Receive and process messages Send messages to other vertices Modify its value, state(active/inactive), its outgoing edges BAG H J C FE D I BA CD FE GH JI B C H IDEBF AB CD FE GH JI VertexSubgraph G

7 Failure recovery problem Running example –All the vertices compute and send messages to all neighbors in all the supersteps –N 1 fails when the job executes in superstep 12 –Two states: record each vertex completes which superstep when failure occurs (S f )and failure is recovered (S f *) Problem statement –For a failure F(N f, s f ), recover vertex states from S f to S f * BAG H J C FE D I SfSf Sf*Sf* A-F: 10; G-J: 12A-J: 12

8 Challenging issues Cascading failures –New failures may occur during the recovery phase –How to handle all the cascading failures if any? Existing solution: treat each cascading failure as an individual failure and restart from the latest checkpoint Recovery latency –Re-execute lost computations to achieve state S* –Forward messages during recomputation –Recover cascading failures –How to perform recovery with minimized latency?

9 Existing recovery mechanisms Checkpoint-based recovery –During normal execution all the compute nodes flush its own graph-related information to a reliable storage at the beginning of every checkpointing superstep (e.g., C+1, 2C+1, …, nC+1). –During recovery let c+1 be the latest checkpointing superstep use healthy nodes to replace failed ones; all the compute nodes rollback to the latest checkpoint and re-execute lost computations since then (i.e., from superstep c+1 to s f ) Simple to implement! Can handle cascading failures! Simple to implement! Can handle cascading failures!  Replay lost computations over whole graph!  Ignore partially recovered workload!  Replay lost computations over whole graph!  Ignore partially recovered workload!

10 Existing recovery mechanisms Checkpoint + log –During normal execution: besides checkpoint, every compute node logs its outgoing messages at the end of each superstep –During recovery Use healthy nodes (replacements) to replace failed one Replacements: – redo lost computation and forward messages among each other; – forward messages to all the nodes in superstep s f Healthy nodes: –holds their original partitions and redo the lost computation by forwarding locally logged messages to failed vertices

11 Existing recovery mechanisms Checkpoint + log –Suppose latest checkpoint is made at the beginning of superstep 11; N 1 (A-F) fails at superstep 12 –During recovery superstep 11: A-F perform computation and send messages to each other; G-J send messages to A-F superstep 12:A-F perform computation and send messages along their outgoing edges; G-J send messages to A-F BAG H J C FE D I Less computation and communication cost!  Overhead of locally logging! (negligible)  Limited parallelism: replacements handle all the lost computation!  Overhead of locally logging! (negligible)  Limited parallelism: replacements handle all the lost computation!

12 Outline Motivation & background Problem statement –Challenging issues –Existing solutions Solution –Reassignment generation –In-parallel recomputation –Workload rebalance Experimental results Conclusions

13 Our solution Partition-based failure recovery –Step 1: generate a reassignment for the failed partitions –Step 2: recompute failed partitions Every node is informed of the reassignment Every node loads its newly assigned failed partitions from the latest checkpoint; redoes lost computations –Step 3: exchange partitions Re-balance workload after recovery

14 Recompute failed partitions

15 Example N 1 fails in superstep 12 –Redo superstep 11, 12 BAG H J C FE D I B A G H J C FE DI (1) reassginment(2) recomputation Less computation and communication cost!

16 Handling cascading failures N 1 fails in superstep 12 N 2 fails superstep 11 during recovery BAG H J C FE D I B A G HJC FE DI (1) reassginment (2) recomputation No need to recover A and B since they have been recovered! Same recovery algorithm can be used to recovery any failure!

17 Reassignment generation When a failure occurs, how to compute a good reassignment for failed partitions? –Minimize the recovery time Calculating recovery time is complicated because it depends on: –Reassignment for the failure –Cascading failures –Reassignment for each cascading failure  No knowledge about cascading failures!

18 Our insight When a failure occurs (can be cascading failure), we prefer a reassignment that can benefit the remaining recovery process by considering all the cascading failures that have occurred We collect the state S after the failure and measure the minimum time T low to achieve S f * –T low provides a lower bound of remaining recovery time

19 Estimation of T low

20 Reassignment generation problem

21 Outline Motivation & background Problem statement –Challenging issues –Existing solutions Solution –Reassignment generation –In-parallel recomputation –Workload rebalance Experimental results Conclusions

22 Experimental evaluation Experiment settings –In-house cluster with 72 nodes, each of which has one Intel X3430 2.4GHz processor, 8GB of memory, two 500GB SATA hard disks and Hadoop 0.20.203.0, and Giraph-1.0.0. Comparisons –PBR(our proposed solution), CBR(checkpoint-based) Benchmark Tasks –K-means –Semi-clustering –PageRank Datasets –Forest –LiveJournal –Friendster

23 PageRank results Logging OverheadSingle Node Failure

24 PageRank results Multiple Node FailureCascading Failure

25 PageRank results (communication cost) Multiple Node FailureCascading Failure

26 Conclusions Develop a novel partition-based recovery method to parallelize failure recovery workload for distributed graph processing Address challenges in failure recovery –Handle cascading failures –Reduce recovery latency Reassignment generation problem Greedy strategy

27 Thank You! Q & A

1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.

Similar presentations

Presentation on theme: "1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.

Similar presentations

Presentation on theme: "1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor."— Presentation transcript:

Similar presentations

About project

Feedback