Phoenix: A Substrate for Resilient Distributed Graph Analytics

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Christian Delbe1 Christian Delbé OASIS Team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis November Automatic Fault Tolerance in ProActive.
Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 1: Introduction Roman Manevich Ben-Gurion University.
SHelp: Automatic Self-healing for Multiple Application Instances in a Virtual Machine Environment Gang Chen, Hai Jin, Deqing Zou, Weizhong Qiang, Gang.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
1 Software Testing and Quality Assurance Lecture 34 – Software Quality Assurance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
A Progressive Fault Tolerant Mechanism in Mobile Agent Systems Michael R. Lyu and Tsz Yeung Wong July 27, 2003 SCI Conference Computer Science Department.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
IPDPS, Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science.
Pregel: A System for Large-Scale Graph Processing
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Rio de Janeiro, October, 2005 SBAC Portable Checkpointing for BSP Applications on Grid Environments Raphael Y. de Camargo Fabio Kon Alfredo Goldman.
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Checksum Strategies for Data in Volatile Memory Authors: Humayun Arafat(Ohio State) Sriram Krishnamoorthy(PNNL) P. Sadayappan(Ohio State) 1.
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
Copyright: Abhinav Vishnu Fault-Tolerant Communication Runtime Support for Data-Centric Programming Models Abhinav Vishnu 1, Huub Van Dam 1, Bert De Jong.
Fault-Tolerant Parallel and Distributed Computing for Software Engineering Undergraduates Ali Ebnenasir and Jean Mayo {aebnenas, Department.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
Mehmet Can Kurt, The Ohio State University Sriram Krishnamoorthy, Pacific Northwest National Laboratory Kunal Agrawal, Washington University in St. Louis.
Is Your Graph Algorithm Eligible for Nondeterministic Execution? Zhiyuan Shao, Lin Hou, Yan Ai, Yu Zhang and Hai Jin Services Computing Technology and.
ERLANGEN REGIONAL COMPUTING CENTER st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Distributed Error- Confinement Shay Kutten (Technion) with Boaz Patt-Shamir (Tel Aviv U.) Yossi Azar (Tel Aviv U.)
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
April 6, 2016ASPLOS 2016Atlanta, Georgia. Yaron Weinsberg IBM Research Idit Keidar Technion Hagar Porat Technion Eran Harpaz Technion Noam Shalev Technion.
Chenning Xie+, Rong Chen+, Haibing Guan*, Binyu Zang+ and Haibo Chen+
E-Storm: Replication-based State Management in Distributed Stream Processing Systems Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein.
Presented by: Daniel Taylor
UbiCrawler: a scalable fully distributed Web crawler
Faster Data Structures in Transactional Memory using Three Paths
PREGEL Data Management in the Cloud
Parallel and Distributed Simulation Techniques
EEC 688/788 Secure and Dependable Computing
Operating System Reliability
Supporting Fault-Tolerance in Streaming Grid Applications
A Lightweight Communication Runtime for Distributed Graph Analytics
Roshan Dathathri          Gurbinder Gill          Loc Hoang
Introduction to Spark.
Data Structures and Algorithms in Parallel Computing
Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu
MapReduce Simplied Data Processing on Large Clusters
CSE 421: Introduction to Algorithms
Agreement Protocols CS60002: Distributed Systems
Apache Spark & Complex Network
EECS 498 Introduction to Distributed Systems Fall 2017
Replication-based Fault-tolerance for Large-scale Graph Processing
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
Middleware for Fault Tolerant Applications
EEC 688/788 Secure and Dependable Computing
CSE 421: Introduction to Algorithms
Gurbinder Gill Roshan Dathathri Loc Hoang
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
EEC 688/788 Secure and Dependable Computing
Many-Core Graph Workload Analysis
Operating System Reliability
Distributed Error- Confinement
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
PHI Research in Digital Science Center
Abstractions for Fault Tolerance
Loc Hoang Roshan Dathathri Gurbinder Gill Keshav Pingali
DistTC: High Performance Distributed Triangle Counting
Gurbinder Gill Roshan Dathathri Loc Hoang Keshav Pingali
Presentation transcript:

Phoenix: A Substrate for Resilient Distributed Graph Analytics Roshan Dathathri          Gurbinder Gill Loc Hoang         Keshav Pingali

Phoenix Substrate to recover from fail-stop faults in distributed graph applications Tolerates arbitrary number of failed machines, including cascading failures Classifies graph algorithms and uses class-specific recovery protocol

Phoenix Substrate to recover from fail-stop faults in distributed graph applications Tolerates arbitrary number of failed machines, including cascading failures Classifies graph algorithms and uses class-specific recovery protocol No overhead in the absence of faults, unlike checkpointing 24x faster than GraphX Evaluated on 128 hosts using graphs      1TB Outperforms checkpointing when up to 16 hosts fail Say GraphX is fault-tolerant distributed graph processing system

State of a graph Graph State of the graph A C E G B D F H ∞ ∞ ∞ ∞ ∞ ∞ ∞ State of the graph A B C D E F G H ∞

Distributed execution model Graph Host h1 Host h2 A C E G A C E E G CuSP [IPDPS’19] B D F H B D D F H State of the graph A B C D E F G H A B C D E D E F G H Galois [SoSP’13] ∞ ∞ ∞ compute compute state transition communicate Gluon [PLDI’18] ∞ 1 Say bfs operator ∞ 1 ∞ 1 1

How to recover from crashes or fail-stop faults? Graph Host h1 Host h2 A C E G A C E E G B D F H B D D F H State of the graph A B C D E F G H A B C D E D E F G H communicate ∞ 2 1 ∞ 2 1 Phoenix preserve re-initialize ∞ 2 1 ∞ 2 1 1 1 ∞ ∞ Fault detected during synchronization

States during algorithm execution and recovery Globally Consistent States Initial State Checkpoint-Restart Fault Phoenix Valid States Final State All States

Classification of graph algorithms  Globally Consistent States        s.t. x x Self-stabilizing algorithms Locally-correcting algorithms Valid States  s.t. x  All States Say bfs is an example for locally-correcting Globally-consistent algorithms Globally-correcting algorithms

Classes: examples and recovery Self-stabilizing algorithms Locally-correcting algorithms Collaborative filtering Belief propagation Pull-style pagerank Pull-style graph coloring Recovery: Reinitialize lost nodes Breadth first search Connected components Data-driven pagerank Topology-driven k-core Recovery: Reinitialize lost nodes Globally-consistent algorithms Globally-correcting algorithms Betweenness centrality Recovery: Restart from last checkpoint Residual-based pagerank Data-driven k-core Latent Dirichlet allocation Recovery: ?

Problem: find k-core of an undirected graph k-core: maximal subgraph where every node has degree at least k A C E G E G B D F H F H Graph 3-core of the graph Say k-core is used in graph coloring (give that as intuition)

k-core algorithm (globally-correcting) If node is alive (1) and its degree < k, mark dead (0) and decrement neighbor’s degree A B C D E F G H 1 2 3 4 5 A C E G 1 1 2 3 5 4 B D F H 1 1 2 4 3 1 1 2 3 Graph Algorithm execution

Phoenix recovery for k-core algorithm Valid state: degree of every node should be the number of alive (1) neighbors Any node can be alive (1) A B C D E F G H A B C D E F G H 1 1 2 3 4 5 4 3 Phoenix A C E G Fault 1 1 1 2 3 5 4 2 1 4 5 3 B D F H 1 1 2 4 3 1 1 2 3 Graph Algorithm execution

Phoenix substrate for recovery Phoenix invoked when fail-stop fault detected Arguments to Phoenix: depends on algorithm class Re-initialization function Re-computation function (globally-correcting) Phoenix recovery: Re-initialize and synchronize proxies Re-compute and synchronize proxies (optional) Locally-correcting algorithm

Experimental setup Systems: Benchmarks: D-Galois Phoenix in D-Galois Checkpoint-Restart (CR) in D-Galois GraphX [GRADES’13] Benchmarks:  Connected components (cc) K-core (kcore) Pagerank (pr) Single source shortest path (sssp)  Inputs twitter rmat28 kron30 clueweb wdc12 |V| 51M 268M 1073M 978M 3,563M |E| 2B 4B 11B 42B 129B |E|/|V| 38 16 44 36 Size (CSR) 16GB 35GB 136GB 325GB 986GB Clusters Stampede Wrangler No. of hosts 128 32 Machine Intel Xeon Phi KNL Intel Xeon Haswell Each host 272 threads of KNL 48 threads of Haswell Memory 96GB DDR3 128GB DDR4 Say algorithm class for each benchmark and why that algorithm was chosen

Wrangler: fault-free total time on 32 hosts Speedup (log scale) Geometric mean: 24x

Stampede: fault-free execution time on 128 hosts Execution Time (s) D-Galois and Phoenix are identical Geometric mean overheads: CR-50: 31%                   CR-500: 8%

Stampede: execution time when faults occur on 128 hosts pr on wdc12 Speedup of Phoenix over CR-50 Speedup of Phoenix over CR-500 Say Phoenix can be used with checkpoint 

Stampede: execution time overhead when faults occur Recovery time of Phoenix is negligible Compared to fault-free execution of Phoenix, when faults occur on 128 hosts: System Number of crashed machines Average execution time overhead Phoenix 4 14% 16 21% 64 44% CR-50 Any 49% CR-500 59%

Fail-stop fault-tolerant distributed graph systems Globally-correcting algorithms? Globally-consistent algorithms? No fault-free execution overhead? Tolerates any number of failed machines? Guarantees precise results? No programmer input? GraphX [GRADES’13]  x Imitator [DSN’14] Zorro [SoCC’15] CoRAL [ASPLOS’17] Phoenix

Future Work Extend Phoenix to handle data corruption errors or byzantine faults Use compilers to generate Phoenix recovery functions automatically Explore Phoenix-style recovery for other application domains

Conclusion Phoenix: substrate to recover from fail-stop faults in distributed graph applications Recovery protocols based on classification of graph algorithms Implemented in D-Galois, the state-of-the-art distributed graph system Evaluated on 128 hosts using graphs 1TB No overhead in the absence of faults, unlike checkpointing Outperforms checkpointing when up to 16 hosts crash

Programmer effort for Phoenix Globally-correcting kcore and pr: 1 day of programming 150 lines of code added (to 300 lines of code) Locally-correcting cc and sssp: Negligible programming effort 30 lines of code added

Phoenix substrate for recovery: globally-correcting

Stampede: execution time when faults occur on 128 hosts cc on wdc12 Speedup of Phoenix over CR-50 Speedup of Phoenix over CR-500

Stampede: execution time when faults occur on 128 hosts kcore on wdc12 Speedup of Phoenix over CR-50 Speedup of Phoenix over CR-500

Stampede: execution time when faults occur on 128 hosts sssp on wdc12 Speedup of Phoenix over CR-50 Speedup of Phoenix over CR-500