Java-Based Parallel Computing on the Internet: Javelin 2.0 & Beyond Michael Neary & Peter Cappello Computer Science, UCSB
Introduction Goals Service parallel applications that are: –Large: too big for a cluster –Coarse-grain: to hide communication latency Simplicity of use –Design focus: decomposition [composition] of computation. Scalable high performance –despite large communication latency Fault-tolerance –1000s of hosts, each dynamically [dis]associates.
Introduction Some Related Work
Introduction Some Applications Search for extra-terrestrial life Computer-generated animation Computer modeling of drugs for: –Influenza –Cancer –Reducing chemotherapy’s side-effects Financial modeling Storing nuclear waste
Outline Architecture Model of Computation API Scalable Computation Experimental Results Conclusions & Future Work
Architecture Basic Components Brokers Clients Hosts
Architecture Broker Discovery B BB B B BBB Broker Naming System B H
Architecture Broker Discovery B BB B B BBB Broker Naming System B H
Architecture Broker Discovery B B B B B B BB Broker Naming System B H
Architecture Broker Discovery B BB B B BBB Broker Naming System B H PING (BID?)
Architecture Broker Discovery B BB B B BBB Broker Naming System B H
Architecture Network of Broker-Managed Host Trees Each broker manages a tree of hosts
Architecture Network of Broker-Managed Host Trees Brokers form a network
Architecture Network of Broker-Managed Host Trees Brokers form a network Client contacts broker
Architecture Network of Broker-Managed Host Trees Brokers form a network Client contacts broker Client gets host trees
Scalable Computation Deterministic Work-Stealing Scheduler Task container addTask( task )getTask( ) stealTask( ) HOST
Scalable Computation Deterministic Work-Stealing Scheduler Task getWork( ) { if ( my deque has a task ) return task; else if ( any child has a task ) return child’s task; else return parent.getWork( ); } CLIENT HOSTS
Models of Computation Master-slave –AFAIK all proposed commercial applications Branch-&-bound optimization –A generalization of master-slave.
Models of Computation Branch & Bound UPPER = LOWER = 0
Models of Computation Branch & Bound UPPER = LOWER = 2
Models of Computation Branch & Bound UPPER = LOWER = 3
Models of Computation Branch & Bound UPPER = 4 LOWER = 4
Models of Computation Branch & Bound UPPER = 3 LOWER = 3
Models of Computation Branch & Bound UPPER = 3 LOWER = 6
Models of Computation Branch & Bound UPPER = 3 LOWER =
Models of Computation Branch & Bound Tasks created dynamically Upper bound is shared To detect termination: scheduler detects tasks that have been: –Completed –Killed (“bounded”)
API public class Host implements Runnable {... public void run() { while ( (node = jDM.getWork()) != null ) { if ( isAtomic() ) compute(); // search space; return result else { child = node.branch(); // put children in child array for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) jDM.addWork( child[i] ); //else child is killed implicitly }
API private void compute() {... boolean newBest = false; while ( (node = stack.pop()) != null ) { if ( node.isComplete() ) if ( node.getCost() < UpperBound ) { newBest = true; UpperBound = node.getCost(); jDM.propagateValue( UpperBound ); best = Node( child[i] ); } else { child = node.branch(); for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) stack.push( child[i] ); //else child is killed implicitly } } if ( newBest ) jDM.returnResult( best ); }
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
Scalable Computation Fault Tolerance via Eager Scheduling When: All tasks have been assigned Some results have not been reported A host wants a new task Re-assign a task! Eager scheduling tolerates faults & balances the load. –Computation completes, if at least 1 host communicates with client.
Scalable Computation Fault Tolerance via Eager Scheduling Scheduler must know which: –Tasks have completed –Nodes have been killed Performance balance –Centralized schedule info –Decentralized computation
Experimental Results
Example of a “bad” graph
Conclusions Javelin 2 relieves designer/programmer managing a set of [Inter-] networked processors that is: –Dynamic –Faulty A wide set of applications is covered by: –Master-slave model –Branch & bound model Weak shared memory performs well. Use multicast (?) for: –Code distribution –Propagating values
Future Work Improve support for long-lived computation: –Do not require that the client run continuously. A dag model of computation –with limited weak shared memory.
Future Work Jini/JavaSpaces Technology TaskManager aka Broker HH HH H H H H “Continuously” disperse Tasks among brokers via a physics model
Future Work Jini/JavaSpaces Technology TaskManager uses persistent JavaSpace –Host management: trivial –Eager scheduling: simple No single point of failure –Fat tree topology
Future Work Advanced Issues Privacy of data & algorithm Algorithms –New computation-communication complexity model –N-body problem, … Accounting: Associate specific work with specific host –Correctness –Compensation (how to quantify?) Create open source organization –System infrastructure –Application codes