Presentation is loading. Please wait.

Presentation is loading. Please wait.

CX: A Scalable, Robust Network for Parallel Computing

Similar presentations


Presentation on theme: "CX: A Scalable, Robust Network for Parallel Computing"— Presentation transcript:

1 CX: A Scalable, Robust Network for Parallel Computing
Peter Cappello & Dimitrios Mourloukos Computer Science UCSB

2 Outline Introduction Related work API Architecture
Experimental results Current & future work

3 Introduction “Listen to the technology!” Carver Mead

4 Introduction “Listen to the technology!” Carver Mead
What is the technology telling us?

5 Introduction “Listen to the technology!” Carver Mead
What is the technology telling us? Internet’s idle cycles/sec growing rapidly

6 Introduction “Listen to the technology!” Carver Mead
What is the technology telling us? Internet’s idle cycles/sec growing rapidly Bandwidth increasing & getting cheaper

7 Introduction “Listen to the technology!” Carver Mead
What is the technology telling us? Internet’s idle cycles/sec growing rapidly Bandwidth is increasing & getting cheaper Communication latency is not decreasing

8 Introduction “Listen to the technology!” Carver Mead
What is the technology telling us? Internet’s idle cycles/sec growing rapidly Bandwidth increasing & getting cheaper Communication latency is not decreasing Human technology is getting neither cheaper nor faster.

9 Introduction Project Goals
Minimize job completion time despite large communication latency

10 Introduction Project Goals
Minimize job completion time despite large communication latency Jobs complete with high probability despite faulty components

11 Introduction Project Goals
Minimize job completion time despite large communication latency Jobs complete with high probability despite faulty components Application program is oblivious to: Number of processors Inter-process communication Hardware faults

12 Introduction Fundamental Issue: Heterogeneity
OS1 OS2 OS3 OS4 OS5 M1 M2 M3 M4 M5 Heterogeneous machine/OS

13 Introduction Fundamental Issue: Heterogeneity
OS1 OS2 OS3 OS4 OS5 M1 M2 M3 M4 M5 Heterogeneous machine/OS Functionally Homogeneous JVM

14 Outline Introduction Related work API Architecture
Experimental results Current & future work

15 Related work Cilk  Cilk-NOW  Atlas DAG computational model
Work-stealing

16 Related work Linda  Piranha  JavaSpaces Space-based coordination
Decoupled communication

17 Related work Charlotte (Milan project / Calypso prototype)
High performance  no distributed transactions Fault tolerance via eager scheduling

18 Related work SuperWeb JavelinJavelin++
Architecture: client, broker, host

19 Outline Introduction Related work API Architecture
Experimental results Current & future work

20 API DAG Computational model int f( int n ) { if ( n < 2 ) return n;
else return f( n-1 ) + f( n-2 ); }

21 DAG Computational Model
int f( int n ) { if ( n < 2 ) return n; else return f( n-1 ) + f( n-2 ); } f(4) Method invocation tree

22 DAG Computational Model
int f( int n ) { if ( n < 2 ) return n; else return f( n-1 ) + f( n-2 ); } f(4) f(3) f(2) Method invocation tree

23 DAG Computational Model
int f( int n ) { if ( n < 2 ) return n; else return f( n-1 ) + f( n-2 ); } f(4) f(3) f(2) f(2) f(1) f(1) f(0) Method invocation tree

24 DAG Computational Model
int f( int n ) { if ( n < 2 ) return n; else return f( n-1 ) + f( n-2 ); } f(4) f(3) f(2) f(2) f(1) f(1) f(0) f(1) f(0) Method invocation tree

25 DAG Computational Model / API
execute( ) { if ( n < 2 ) setArg( , n ); else { spawn ( ); spawn ( ); } _______________________________ f(n) f(4) + + f(n-1) f(n-2) execute( ) { setArg( , in[0] + in[1] ); } + +

26 DAG Computational Model / API
execute( ) { if ( n < 2 ) setArg( , n ); else { spawn ( ); spawn ( ); } _______________________________ f(n) f(4) + f(3) f(2) + f(n-1) f(n-2) execute( ) { setArg( , in[0] + in[1] ); } + + +

27 DAG Computational Model / API
execute( ) { if ( n < 2 ) setArg( , n ); else { spawn ( ); spawn ( ); } _______________________________ f(n) f(4) + f(3) f(2) + f(n-1) f(2) f(1) f(1) f(0) f(n-2) + execute( ) { setArg( , in[0] + in[1] ); } + + + +

28 DAG Computational Model / API
execute( ) { if ( n < 2 ) setArg( , n ); else { spawn ( ); spawn ( ); } _______________________________ f(n) f(4) + f(3) f(2) + f(n-1) f(2) f(1) f(1) f(0) f(n-2) f(1) f(0) + + execute( ) { setArg( , in[0] + in[1] ); } + + + +

29 Outline Introduction Related work API Architecture
Experimental results Current & future work

30 Architecture: Basic Entities
register ( spawn | getResult )* unregister CONSUMER PRODUCTION NETWORK CLUSTER NETWORK

31 Architecture: Cluster
PRODUCER TASK SERVER PRODUCER PRODUCER PRODUCER

32 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(4)
+ f(1) f(0) PRODUCER TASK SERVER READY PRODUCER WAITING

33 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(4)

34 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(4)

35 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(4)

36 Decompose execute( ) { if ( n < 2 ) setArg( ArgAddr, n ); else
spawn ( + ); spawn ( f(n-1) ); spawn ( f(n-2) ); }

37 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(4)
+ +

38 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(3)
+ +

39 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(3)
+ +

40 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(3)
+ +

41 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(3)
+ PRODUCER WAITING f(2) + + + + +

42 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(2)
+ PRODUCER WAITING + + + + +

43 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(2)
+ PRODUCER WAITING f(1) + + + + +

44 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(2)
+ PRODUCER WAITING f(1) + + + + +

45 Compute Base Case execute( ) { if ( n < 2 ) setArg( ArgAddr, n );
else spawn ( + ); spawn ( f(n-1) ); spawn ( f(n-2) ); }

46 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(1)
+ PRODUCER WAITING f(1) + + + + + + +

47 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(1)
+ PRODUCER WAITING + + + + + + +

48 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(1)
+ PRODUCER f(0) WAITING + + + + + + +

49 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(1)
+ PRODUCER f(0) WAITING + + + + + + +

50 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(1)
+ PRODUCER WAITING f(0) + + + + + + +

51 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(1)
+ f(1) f(0) + PRODUCER WAITING + + + + + + +

52 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(1)
+ f(1) f(0) + PRODUCER WAITING + + + + + +

53 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + f(1)

54 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + f(0)

55 Compose execute( ) { setArg( ArgAddr, in[0] + in[1] ); }

56 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + f(0)

57 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(0)
+ + + + + +

58 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(0)
+ + + + + +

59 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(0)
+ + + + + +

60 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING f(0)
+ + + + + +

61 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + + + +

62 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + + + +

63 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + + + +

64 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + + + +

65 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + + + +

66 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + + + +

67 A Cluster at Work PRODUCER TASK SERVER READY + PRODUCER WAITING + + +

68 A Cluster at Work PRODUCER READY TASK SERVER PRODUCER WAITING + + + +

69 A Cluster at Work PRODUCER TASK SERVER READY + PRODUCER WAITING + + +

70 A Cluster at Work PRODUCER TASK SERVER READY + PRODUCER WAITING + + +

71 A Cluster at Work PRODUCER TASK SERVER READY + PRODUCER WAITING + +

72 A Cluster at Work PRODUCER TASK SERVER READY + PRODUCER WAITING +

73 A Cluster at Work PRODUCER TASK SERVER READY + + PRODUCER WAITING +

74 A Cluster at Work PRODUCER TASK SERVER READY + PRODUCER WAITING +

75 A Cluster at Work PRODUCER TASK SERVER READY + R PRODUCER WAITING +

76 A Cluster at Work Result object is sent to Production Network PRODUCER
TASK SERVER Result object is sent to Production Network Production Network returns it to Consumer READY R PRODUCER WAITING

77 Task Server Proxy Overlap Communication with Computation
PRODUCER TASK SERVER READY Task Server Proxy PRIORITY Q COMP COMM OUTBOX INBOX WAITING

78 Architecture Work stealing & eager scheduling
A task is removed from the server only after a complete signal is received. A task may be assigned to multiple producers Balance task load among producers of varying processor speeds Tasks on failed/retreating producers are re-assigned.

79 Architecture: Scalability
A cluster tolerates producer: Retreat Failure 1 task server however is a: Bottleneck Single point of failure. Use a network of task servers/clusters.

80 Scalability: Class loading
CX class loader loads classes (Consumer JAR) in each server’s class cache 2. Producer loads classes from its server

81 Scalability: Fault-tolerance
Replicate a server’s tasks on its sibling

82 Scalability: Fault-tolerance
Replicate a server’s tasks on its sibling

83 Scalability: Fault-tolerance
When server fails, its sibling restores state to replacement server Replicate a server’s tasks on its sibling

84 Architecture Production network of clusters
Network tolerates single server failure. Restores ability to tolerate a single server failure. Tolerates a sequence of single server failures.

85 Outline Introduction Related work API Architecture
Experimental results Current & future work

86 Preliminary experiments
Experiments run on Linux cluster 100 port Lucent P550 Cajun Gigabit Switch Machine 2 Intel EtherExpress Pro 100 Mb/s Ethernet cards Red Hat Linux 6.0 JDK 1.2.2_RC3 Heterogeneous processor speeds processors/machine

87 Fibonacci Tasks with Synthetic Load
execute( ) { if ( n < 2 ) synthetic workload(); setArg( , n ); else { spawn ( ); spawn ( ); } execute( ) { synthetic workload(); setArg( , in[0] + in[1] ); } f(n) + + + + f(n-1) f(n-2)

88 TSEQ vs. T1 (seconds) Computing F(8)
Workload TSEQ T1 Efficiency 4.522 0.96 3.740 0.95 2.504 0.94 1.576 0.90 0.914 0.88 0.468 56.160 65.767 0.85 0.198 24.750 29.553 0.84 0.058 8.120 11.386 0.71

89 Average task time: Workload 1 = 1.8 sec. Workload 2 = 3.7 sec. Parallel efficiency for F(13) = 0.77 Parallel efficiency for F(18) = 0.99

90 Outline Introduction Related work API Architecture
Experimental results Current & future work

91 { } Current work Implement CX market maker (broker)
Solves discovery problem between Consumers & Production networks Enhance Producer with Lea’s Fork/Join Framework See gee.cs.oswego.edu CONSUMER PRODUCTION NETWORK MARKET MAKER } { JINI Service

92 Current work Enhance computational model: branch & bound.
Propagate new bounds thru production network: 3 steps SEARCH TREE PRODUCTION NETWORK BRANCH TERMINATE!

93 Current work Enhance computational model: branch & bound.
Propagate new bounds thru production network: 3 steps SEARCH TREE PRODUCTION NETWORK TERMINATE!

94 Current work Investigate computations that appear ill-suited to adaptive parallelism SOR N-body.

95 Thanks!

96 End of CX Presentation www.cs.ucsb.edu/research/cx
Next release: End of June, includes source.

97 Introduction Fundamental Issues
Communication latency Long latency  Overlap computation with communication. Robustness Massive parallelism  faults Scalability Massive parallelism  login privileges cannot be required. Ease of use Jini  easy upgrade of system components

98 Related work Market mechanisms
Huberman, Waldspurger, Malone, Miller & Drexler, Newhouse & Darlington

99 Related work CX integrates DAG computational model
Work-stealing scheduler Space-based, decoupled communication Fault-tolerance via eager scheduling Market mechanisms (incentive to participate)

100 Architecture Task identifier
Dag has spawn tree TaskID = path id Root.TaskID = 0 TaskID used to detect duplicate: Tasks Results. F(4) 1 2 F(3) F(2) 2 1 1 2 F(2) F(1) F(1) F(0) 2 1 F(1) F(0) + + + +

101 Architecture: Basic Entities
Consumer Seeks computing resources. Producer Offers computing resources. Task Server Coordinates task distribution among its producers. Production Network A network of task servers & their associated producers.

102 Defining Parallel Efficiency
Scalar: Homogeneous set of P machines: Parallel efficiency = (T1 / P) / TP Vector: Heterogeneous set of P machines: P = [ P1, P2, …, Pd ], where there are P1 machines of type 1, P2 machines of type 2, … Pd machines of type d : Parallel efficiency = ( P1 / T1 + P2 / T2 + … Pd / Td ) –1 / TP

103 Future work Support special hardware / data: inter-server task movement. Diffusion model: Tasks are homogeneous gas atoms diffusing through network. N-body model: Each kind of atom (task) has its own: Mass (resistance to movement: code size, input size, …) attraction/repulsion to different servers Or other “massive” entities, such as: special processors large data base.

104 Future Work CX preprocessor to simplify API.


Download ppt "CX: A Scalable, Robust Network for Parallel Computing"

Similar presentations


Ads by Google