Is Your Graph Algorithm Eligible for Nondeterministic Execution? Zhiyuan Shao, Lin Hou, Yan Ai, Yu Zhang and Hai Jin Services Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology ICPP’15
Outline Motivation System model Algorithm Convergence Evaluation Conclusion
Motivation “Big data” era Loosely coupled data Key-value pairs Hadoop, Spark, many others Tightly coupled data Graph data Pregel, GraphLab, GraphChi, X-Stream, many others Graph computing Execution model Synchronous model (BSP) Asynchronous model Execution manner Deterministic executions Nondeterministic executions
Motivation (Cont’d) Deterministic execution Widely and extensively studied Architecture, OS, Scheduling Set/Chromatic scheduler (GraphLab), DIG (Galois), external deterministic (GraphChi) Pros. Deterministic execution path (always) leads to deterministic results Cons. High overhead introduced to order the tasks (consider a billion-node graph!) Nondeterministic execution Poorly studied Pros. High parallelism, High performance! Cons. Need to prevent (at least) data-races Un-documented
Motivation (Cont’d) Example of two execution manners Problem: High overhead for defining the execution sequence! Question: What if all these tasks are executed nondeterministically? A1: Obviously, Avoided ordering overhead and improved parallelism! A2: Data-races on edges! Taken from GraphLab paper But what if we eliminate the data-races?
Motivation (Cont’d) Objective of this research Study the nondeterministic execution of graph algorithms Wait…… Why to study that? Graph algorithms are special cases of parallel computing! Iterative computing Associative law: a+(b+c) = (a+b)+c Idempotent law: f(f(x)) = f(x) Potential towards higher performance! Questions: Will an algorithm converge by nondeterministic executions? Will the executions lead to deterministic results (i.e., external deterministic)?
Outline Motivation System model Algorithm Convergence Evaluation Conclusion
System model Share memory computer # processors >= 1 Graphs loaded in memory COTS components, nothing special for HW and OS Synchronous implementation of asynchronous model Computing is organized by multiple iterations Barriers are enforced between two consecutive iterations Updates are applied “immediately” Example: GraphChi, GRACE Vertex-centric computing “Think Like A Vertex” Data-dependences happen on edges
System model (Cont’d) Race-free Method1: Architecture support Method2: Compiler support Method3: Explicit lock/unlock Convert data-races to “conflicts” Scheduling General methods Example: static, dynamic or other methods in OpenMP Assumption on scheduling DvDv DuDu v u DeDe DeDe add_schedule(u)
Outline Motivation System model Algorithm Convergence Evaluation Conclusion
Algorithm Convergence Methodology Classify the “conflicts” on edges Read-write conflicts Case1: Read-after-write read new value converge Case2: Write-after-read read old value converge? Write-write conflicts Case1: (correct)write-after-(wrong)write correct edge values converge Case2: (wrong)write-after-(correct)write corruption edge values converge?
Algorithm Convergence (Cont’d) Read-write conflict DvDv DuDu v u DeDe DeDe Case1: Read-after-write Converge DvDv DuDu v u DeDe DeDe Case2: Write-after-read DvDv DuDu v u Read old value Next iteration Converge DeDe DeDe
Algorithm Convergence (Cont’d) Sufficient condition1 to convergence Chain-to-converge exists Deduction1: If algorithm A on graph G converges with synchronous model execution, A will converge with nondeterministic execution. Deduction2: If algorithm A on graph G converge by a deterministic scheduler of asynchronous mode, A will converge with nondeterministic execution. Example algorithms that converge: PageRank Many other fixed point iterative algorithms
Algorithm Convergence (Cont’d) Write-write conflicts DvDv DuDu v u DeDe DeDe Case1: (correct)write-after-(wrong)write DeDe Converge DvDv DuDu v u DeDe DeDe Case2: (wrong)write-after-(correct)write DeDe DvDv DuDu v u DeDe DeDe Corrupted edge value Next iteration DeDe Falsely converge Correcting edge value DvDv DuDu v u DeDe Next iteration Converge
Algorithm Convergence (Cont’d) Sufficient condition2 to convergence In order to correct the corrupted edge value: Algorithm A on graph G converges with deterministic asynchronous model execution. Algorithm A satisfies monotonicity property. (falsely converge) Algorithms that converge: WCC (Weakly Connected Components) by MLP (Minimal Label Propagation) BFS (Breadth First Search) Many other graph traversal algorithms Algorithms that does not converge: BP (Belief Propagation)
Outline Motivation System model Algorithm Convergence Evaluation Conclusion
Evaluation Experiment setup 2*2.6-GHz Intel Xeon E processors (8 cores) 64GB of RAM GCC version: Real-world graph data-sets Web-BerkStan, web-Google, soc-LiveJournal1, cage15 Platform GraphChi (C++ version 0.2) Algorithms PageRank, SSSP, WCC, BFS Avail at:
Evaluation (Cont’d) Using architecture support achieves best performance (exec. time reduction can be up to 70%) Using explicit locking/unlocking achieves not the best performance, but still good scalability, and sometimes even outperform deterministic executions.
Evaluation (Cont’d) difference degree is 3 Result1:{1, 2, 3, 5, 7} Result2:{1, 2, 3, 7, 5} Suffix---- 0, 1, 2, 3, 4 Results are not deterministic (external deterministic) With increased precision (smaller ε), variations in results move to less important pages How about the produced results of PageRank? Measure the difference:
Outline Motivation System model Algorithm Convergence Evaluation Conclusion
Graph algorithms are special cases of parallel computing Does not necessarily need high overhead deterministic executions! Most of the algorithms can be executed nondeterministically Examples include PageRank, WCC, BFS and many others. Not all of the nondeterministic executions produce deterministic results! Open problems More discussions on sufficient conditions for algorithm convergence by nondeterministic execution More discussions on the variations (nondeterminacy) in results produced by nondeterministic executions (e.g., PageRank) Theoretical analysis on speed of convergence Extending the system model to pure asynchronous computing
Thank you! Q&A