Download presentation
Presentation is loading. Please wait.
Published byDuane Fitzgerald Modified over 9 years ago
1
Java-Based Parallel Computing on the Internet: Javelin 2.0 & Beyond Michael Neary & Peter Cappello Computer Science, UCSB
2
Introduction Goals Service parallel applications that are: –Large: too big for a cluster –Coarse-grain: to hide communication latency Simplicity of use –Design focus: decomposition [composition] of computation. Scalable high performance –despite large communication latency Fault-tolerance –1000s of hosts, each dynamically [dis]associates.
3
Introduction Some Related Work
4
Introduction Some Applications Search for extra-terrestrial life Computer-generated animation Computer modeling of drugs for: –Influenza –Cancer –Reducing chemotherapy’s side-effects Financial modeling Storing nuclear waste
5
Outline Architecture Model of Computation API Scalable Computation Experimental Results Conclusions & Future Work
6
Architecture Basic Components Brokers Clients Hosts
7
Architecture Broker Discovery B BB B B BBB Broker Naming System B H
8
Architecture Broker Discovery B BB B B BBB Broker Naming System B H
9
Architecture Broker Discovery B B B B B B BB Broker Naming System B H
10
Architecture Broker Discovery B BB B B BBB Broker Naming System B H PING (BID?)
11
Architecture Broker Discovery B BB B B BBB Broker Naming System B H
12
Architecture Network of Broker-Managed Host Trees Each broker manages a tree of hosts
13
Architecture Network of Broker-Managed Host Trees Brokers form a network
14
Architecture Network of Broker-Managed Host Trees Brokers form a network Client contacts broker
15
Architecture Network of Broker-Managed Host Trees Brokers form a network Client contacts broker Client gets host trees
16
Scalable Computation Deterministic Work-Stealing Scheduler Task container addTask( task )getTask( ) stealTask( ) HOST
17
Scalable Computation Deterministic Work-Stealing Scheduler Task getWork( ) { if ( my deque has a task ) return task; else if ( any child has a task ) return child’s task; else return parent.getWork( ); } CLIENT HOSTS
18
Models of Computation Master-slave –AFAIK all proposed commercial applications Branch-&-bound optimization –A generalization of master-slave.
19
Models of Computation Branch & Bound 34 8 7 12 10 9 3 6 8 2 7 0 0 UPPER = LOWER = 0
20
Models of Computation Branch & Bound 34 8 7 12 10 9 3 6 8 2 7 0 2 0 UPPER = LOWER = 2
21
Models of Computation Branch & Bound 34 8 7 12 10 9 3 6 8 2 7 0 3 2 0 UPPER = LOWER = 3
22
Models of Computation Branch & Bound 34 8 7 12 10 9 3 6 8 2 7 0 4 3 2 0 UPPER = 4 LOWER = 4
23
Models of Computation Branch & Bound 34 8 7 12 10 9 3 6 8 2 7 0 34 3 2 0 UPPER = 3 LOWER = 3
24
Models of Computation Branch & Bound 34 8 7 12 10 9 3 6 8 2 7 0 34 3 6 2 0 UPPER = 3 LOWER = 6
25
Models of Computation Branch & Bound 34 8 7 12 10 9 3 6 8 2 7 0 UPPER = 3 LOWER = 7 34 3 6 2 7 0
26
Models of Computation Branch & Bound Tasks created dynamically Upper bound is shared To detect termination: scheduler detects tasks that have been: –Completed –Killed (“bounded”) 34 3 6 2 7 0
27
API public class Host implements Runnable {... public void run() { while ( (node = jDM.getWork()) != null ) { if ( isAtomic() ) compute(); // search space; return result else { child = node.branch(); // put children in child array for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) jDM.addWork( child[i] ); //else child is killed implicitly }
28
API private void compute() {... boolean newBest = false; while ( (node = stack.pop()) != null ) { if ( node.isComplete() ) if ( node.getCost() < UpperBound ) { newBest = true; UpperBound = node.getCost(); jDM.propagateValue( UpperBound ); best = Node( child[i] ); } else { child = node.branch(); for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) stack.push( child[i] ); //else child is killed implicitly } } if ( newBest ) jDM.returnResult( best ); }
29
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
30
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
31
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
32
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
33
Scalable Computation Weak Shared Memory Model Slow propagation of bound affects performance not correctness. Propagate bound
34
Scalable Computation Fault Tolerance via Eager Scheduling When: All tasks have been assigned Some results have not been reported A host wants a new task Re-assign a task! Eager scheduling tolerates faults & balances the load. –Computation completes, if at least 1 host communicates with client.
35
Scalable Computation Fault Tolerance via Eager Scheduling Scheduler must know which: –Tasks have completed –Nodes have been killed Performance balance –Centralized schedule info –Decentralized computation 34 3 6 2 7 0
36
Experimental Results
37
34 8 7 12 10 9 3 6 8 2 7 0 Example of a “bad” graph
38
Conclusions Javelin 2 relieves designer/programmer managing a set of [Inter-] networked processors that is: –Dynamic –Faulty A wide set of applications is covered by: –Master-slave model –Branch & bound model Weak shared memory performs well. Use multicast (?) for: –Code distribution –Propagating values
39
Future Work Improve support for long-lived computation: –Do not require that the client run continuously. A dag model of computation –with limited weak shared memory.
40
Future Work Jini/JavaSpaces Technology TaskManager aka Broker HH HH H H H H “Continuously” disperse Tasks among brokers via a physics model
41
Future Work Jini/JavaSpaces Technology TaskManager uses persistent JavaSpace –Host management: trivial –Eager scheduling: simple No single point of failure –Fat tree topology
42
Future Work Advanced Issues Privacy of data & algorithm Algorithms –New computation-communication complexity model –N-body problem, … Accounting: Associate specific work with specific host –Correctness –Compensation (how to quantify?) Create open source organization –System infrastructure –Application codes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.