Pedro C. Diniz Information Sciences Institute Viterbi School of Engineering Atomic-Delayed Execution: A Concurrent Programming Model for Incomplete Graph-based Computations
Big-Data and Graph Analytics Cyber-Security Large Network Systems Social Networks Combination of the above Challenges Ton of bytes (not ton of flops) Massive Concurrency but Little data locality Low Computation to Communication ratio Frequent Synchronization Work tends to be Dynamic and Imbalanced Data may even become unavailable Programming for this Application Domain is Non-Trivial Motivation
Example: Minimum Distance to Root Node Simple Pointer-based Acyclic Graph Computation Compute for each node the Minimal Distance to a “root” Node Store Value of Distance in Node Save Selected Nodes in Set
Example: Minimum Distance to Root Node Because the Graph is Potentially Very Big Cannot Do It Sequentially Limited in Time Need to Tolerate “incorrect” Answers Exploit Concurrency Atomic Updates to Distance in Node Skip if Value is Already Lower than Argument
Example: Concurrent Traversal Create a Thread at Each Invocation Visit Nodes and Check Distance against Argument Update Distance Atomically and Proceed
Example: Concurrent Traversal Create a Thread at Each Invocation Visit Nodes and Check Distance against Argument Update Distance Atomically and Proceed
Example: Concurrent Traversal Create a Thread at Each Invocation Visit Nodes and Check Distance against Argument Update Distance Atomically and Proceed Yes, we may do more work than sequential 2 1
class node { int depth; node *left, *right; }; Example: Code void node::traversal(int { time(T) } { atomic { if(depth > val){ depth = val; } par { if (left != NULL) left->traversal(val+1); if (right != NULL) right->traversal(val+1); } exception { error.memory : { continue; } timer.expired : { return; } }
class node { int depth; node *left, *right; }; Example: Code void node::traversal(int { time(T) } { atomic { if(depth > val){ depth = val; } else { return; } par { if (left != NULL) left->traversal(val+1); if (right != NULL) right->traversal(val+1); } exception { error.memory : { continue; } timer.expired : { return; } }
exception { timer.expired : { time (T) { par { if (left != NULL) left->traversal(val+1); if (right != NULL) right->traversal(val+1); } Example: Delayed Execution When Time Expires: Return Control Continue for another Time Quantum Separate Thread Updates Objects Atomically
Concepts: Objects, Concurrency and Atomic Objects and Methods Data Encapsulation Separability (key): Decouple Updates to Object from Concurrent Invocations Uses only symbolically constant object data and arguments Atomicity: Avoids Race but not indeterminism Facilitates Reasoning In Principle could have Many Atomic Sections Concurrency
Experiments: Concurrency Environment Using pthreads Master threads and N Workers Work stealing at a work-pool Exception flag is checked when attempting to steal work Objects in C share a Pool of Mutex Locks Some possible false contention Timed and Delayed Execution Sharing two global Timers (for simplicity)
Experiments: Graph Computation Search Image Feature in Graph Nodes represent people and have 1 image Edges represent associations Collect from a given “root” node Nodes at distance greater than 2 Share the same features (computational intensive) Graphs Synthetically-Generated with RMAT algorithm Experiments: Timed Executions Faults in Node Edges
Results: Completeness and “Correctness”
Tolerance to “Errors”
Summary Object-based programming model with timed and delayed executions Geared towards computations in very large data sets where the data cannot be traversed in useful time or is simply unavailable due to uncorrected memory errors. Presented experimental results for a concurrent incomplete graph-based computation to deliver feasible results in strict time bounds and in the presence of memory errors. Foresee the need to allow programmers to specify time limits for the computation so that systems can make progress with limited, and incomplete, data.
Acknowledgements Partial support for this work was provided by the US Army Research Office (Award W911NF ) Partial support for this work was provided by the US Department of Energy (DoE) Office of Science, Advanced Scientific Computing Research through the SciDAC-3 SUPER Research Institute (Contract Number DE-SC ) Acknowledgements
Pedro Diniz