CSS434 MPI1 CSS434 Group Communication and MPI Textbook Ch4.4-5 and 15.4 Professor: Munehiro Fukuda.

CSS434 MPI1 CSS434 Group Communication and MPI Textbook Ch4.4-5 and 15.4 Professor: Munehiro Fukuda

CSS434 MPI2 Outline Reliability of group communication IP multicast Atomic multicast Ordering of group communication Absolute ordering Total ordering Causal ordering FIFO ordering MPI Java Programming, compilation and invocation Major group communication functions: Bcast( ), Reduce( ), Allreduce( )

CSS434 MPI3 Group Communication Communication types One-to-many: broadcast Many-to-one: synchronization, collective communication Many-to-many: gather and scatter Applications Fault tolerance based on replicated services (type: one-to-many) Finding the discovery servers in spontaneous networking (type: a one-to-many request, a many-to-one response) Better performance through replicated data (type: one-to-many, many-to-one) Propagation of event notification (type: one-to-many)

CSS434 MPI4 IP (Unreliable) Multicast import java.net.*; import java.io.*; public class MulticastPeer{ public static void main(String args[]){ // args give message contents & destination multicast group (e.g. "228.5.6.7") MulticastSocket s =null; try { InetAddress group = InetAddress.getByName(args[1]); s = new MulticastSocket(6789); s.joinGroup(group); byte [] m = args[0].getBytes(); DatagramPacket messageOut = new DatagramPacket(m, m.length, group, 6789); s.send(messageOut); // get messages from others in group byte[] buffer = new byte[1000]; for(int i=0; i< 3; i++) { DatagramPacket messageIn = new DatagramPacket(buffer, buffer.length); s.receive(messageIn); System.out.println("Received:" + new String(messageIn.getData())); } s.leaveGroup(group); }catch (SocketException e){System.out.println("Socket: " + e.getMessage()); }catch (IOException e){System.out.println("IO: " + e.getMessage());} }finally {if(s != null) s.close();} } Using a special network address: IP Class D and UDP

CSS434 MPI5 Reliability and Ordering Fault tolerance based on replicated services Send-to-all and all-reliable semantics Finding the discovery servers in spontaneous networking 1-reliable semantics Better performance through replicated data Semantics depend on applications. (send-to-all and all- or m-out-of-n reliable semantics) Propagation of event notifications In general, send-to-all and all-reliable semantics

CSS434 MPI6 Atomic Multicast Send-to-all semantics and all-reliable Simple emulation: A repetition of one-to-one communication with acknowledgment What if a receiver fails Time-out retransmission What if a sender fails before all receivers receive a message All receivers forward the message to the same group and thereafter deliver it to themselves. A receiver discard the 2 nd or the following messages.

CSS434 MPI7 Message Ordering R1 and R2 receive m1 and m2 in a different order! Some message ordering required Absolute ordering Consistent/total ordering Causal ordering FIFO ordering S1R1R2 S2 m1 m2

CSS434 MPI8 Absolute Ordering Rule: Mi must be delivered before mj if Ti < Tj Implementation: A clock synchronized among machines A sliding time window used to commit message delivery whose timestamp is in this window. Example: Distributed simulation Drawback Too strict constraint No absolute synchronized clock No guarantee to catch all tardy messages mi mj Tj Ti Ti < Tj

CSS434 MPI9 Consistent/Total Ordering Rule: Messages received in the same order (regardless of their timestamp). Implementation: A message sent to a sequencer, assigned a sequence number, and finally multicast to receivers A message retrieved in incremental order at a receiver Example: Replicated database updates Drawback: A centralized algorithm mi mj Tj Ti Ti < Tj

CSS434 MPI10 Total Ordering Using A Sequencer Message sent to all group members and a sequencer Receive a message, associate it with a sequence number. multicast it to all group members, and increments the sequence number Receive an incoming message in a temporary queue Receive a sequence number message from the sequencer, Reorder the incoming message with this sequence #, and Deliver it if my local counter reaches this number.

CSS434 MPI11 The ISIS System for Total Ordering 2 1 1 2 2 1 Message 2 Proposed Seq P 2 P 3 P 1 P 4 3 Agreed Seq 3 3 Sender Receiver A p4 P p4 A p4 P p4 A p4 P p4 Proposed Seq P p4 = max(A p4, P p4 ) + 1 a = max(P p1, P p2, P p3, P p4 ) A p4 = max(A p4, a)

CSS434 MPI12 Causal Ordering Rule: Happened-before relation If e k i, e l i ∈ h and k < l, then e k i → e l i, If e i = send(m) and e j = receive(m), then e i → e j, If e → e’ and e’ → e”, then e → e” Implementation: Use of a vector message Example: Distributed file system Drawback: Vector as an overhead Broadcast assumed S1 R1 R2 R3 S2 m1 m2 m3 m4 From R2 ’ s view point m1 → m2

CSS434 MPI13 Causal Ordering Using A Vector Stamps Increment my vector element, and Send a messge with a vector Increment the sender ’ s vector element Each process maintains a vector. Make sure J ’ s element of J ’ s vector reaches J ’ s element of my vector. (All J ’ s previous message has been delivered.) Make sure all elements of J ’ s vector <= all elements of my vector. (I ’ ve delivered all messages that J has delivered.)

CSS434 MPI14 Vector Stamps S[i] = R[i] + 1 where i is the source id S[j] ≤ R[j] where i≠j Site A Site B Site CSite D 2, 1, 1, 0 1, 1, 1, 0 2, 1, 0, 0 delayed delivered 3,1,1,0

CSS434 MPI15 FIFO Ordering Rule: Messages received in the same order as they were sent. Implementation: Messages assigned a sequence number Example: TCP This is the weakest ordering. Router 1 Router 2 m1 m2 m3 m4 m1 m2 m3 m4 S R

CSS434 MPI16 Why High-Level Message Passing Tools? Data formatting Data formatted into appropriate types at user level Non-blocking communication Polling and interrupt handled at system call level Process addressing Inflexible hardwired addressing with machine id + local id Group communication Group server implemented at user level Broadcasting simulated by a repetition of one-to-one communication

CSS434 MPI17 PVM and MPI PVM: Parallel Virtual Machine Developed in 80 ’ s The pioneer library to provide high-level message passing functions The PVM daemon process taking care of message transfer for user processes in background MPI: Message Passing Interface Defined in 90 ’ s The specification of high-level message passing functions Several implementations available: mpich, mpi-lam Library functions directly linked to user programs (no background daemons) The detailed difference is shown by: PVMvsMPI.pdf

CSS434 MPI18 Getting Started with MPI Java Website: http://www.hpjava.org/courses/arl/lectures/mpi.ppt http://www.hpjava.org/reports/mpiJava-spec/mpiJava-spec.pdf Creating a machines file: [mfukuda@UW1-320-00 mfukuda]$ vi machines uw1-320-00 uw1-320-01 uw1-320-02 uw1-320-03 Compile a source program: [mfukuda@UW1-320-00 mfukuda]$ javac MyProg.java Run the executable file: [mfukuda@UW1-320-00 mfukuda]$ prunjava 4 MyProg args

CSS434 MPI19 Program Using MPI import mpi.*; class MyProg { public static void main( String[] args ) { MPI.Init( args ); // Start MPI computation int rank = MPI.COMM_WORLD.Rank( ); // Process ID (from 0 to #processes – 1) int size = MPI.COMM_WORLD.Size( ); // # participating processes System.out.println( "Hello World! I am " + rank + " of " + size ); MPI.Finalize(); // Finish MPI computation }

CSS434 MPI20 MPI_Send and MPI_Recv void MPI.COMM_WORLD.Send( Object[]message/* in */, intoffset /* in */, intcount/* in */, MPI.Datatypedatatype/* in */, intdest/* in */, inttag/* in */) Status MPI.COMM_WORLD.Recv( Object[]message/* in */, intoffset/* in */, intcount/* in */, MPI.Datatypedatatype/* in */, intsource/* in */, inttag/* in */) int Status.Get_count( MPI.Datatype, datatype ) /* #objects received */ MPI.Datatype =BYTE, CHAR, SHORT, INT, LONG, FLOAT, DOUBLE, OBJECT

CSS434 MPI21 MPI.Send and MPI.Recv import mpi.*; class myProg { public static void main( String[] args ) { int tag0 = 0; MPI.Init( args ); // Start MPI computation if ( MPI.COMM_WORLD.rank() == 0 ) { // rank 0…sender int loop[1]; loop[0] = 3; MPI.COMM_WORLD.Send( "Hello World!", 12, MPI.CHAR, 1, tag0 ); MPI.COMM_WORLD.Send( loop, 1, MPI.INT, 1, tag0 ); } else { // rank 1…receiver int loop[1]; char msg[12]; MPI::COMM_WORLD.Recv( msg, 12, MPI.CHAR, 0, tag0 ); MPI::COMM_WORLD.Recv( loop, 1, MPI.INT, 0, tag0 ); for ( int i = 0; i < loop[0]; i++ ) System.out.println( msg ); } MPI.Finalize( ); // Finish MPI computation }

CSS434 MPI22 Message Ordering in MPI FIFO Ordering in each data type Messages reordered with a tag in each data type SourceDestination SourceDestination tag = 1 tag = 2 tag = 3

CSS434 MPI23 MPI.Bcast void MPI.COMM_WORLD.Bcast( Object[]message/* in */, intoffset/* in */, intcount/* in */, MPI.Datatypedatatype/* in */, introot/* in */) Rank 0 Rank 1 Rank 2 Rank 3 Rank 4 MPI::COMM_WORLD.Bcast( msg, 0, 1, MPI.INT, 2); msg[0]

CSS434 MPI24 MPI_Reduce void MPI.COMM_WORLD.Reduce( Object[]sendbuf/* in */, intsendoffset/* in */, Object[]recvbuf/* out */, intrecvoffset/* in */, intcount/* in */, MPI.Datatypedatatype/* in */, MPI.Opoperator/* in */, introot/* in */ ) MPI.Op = MPI.MAX (Maximum),MPI.MIN (Minimum),MPI.SUM (Sum), MPI.PROD (Product),MPI.LAND (Logical and),MPI.BAND (Bitwise and), MPI.LOR (Logical or),MPI.BOR (Bitwise or),MPI.LXOR (logical xor), MPI.BXOR(Bitwise xor),MPI.MAXLOC (MAX location)MPI.MINLOC (MIN loc.) MPI::COMM_WORLD.Reduce( msg, 0, result, 0, 1, MPI.INT, MPI.SUM, 2); Rank0 15 Rank1 10 Rank2 12 Rank3 8 Rank4 4 49

CSS434 MPI25 MPI_Allreduce void MPI.COMM_WORLD.Allreduce( Object[]sendbuf/* in */, intsendoffset/* in */, Object[]recvbuf/* out */, intrecvoffset/* in */, intcount/* in */, MPI.Datatypedatatype/* in */, MPI.Opoperator/* in */) 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7

CSS434 MPI26 Exercises (No turn-in) 1. Explain, in atomic multicast on page 6, why reversing the order of operations “ all receivers forward the message to the same group and thereafter deliver it to themselves ” makes the multicast no longer atomic. 2. Assume that four processes communicate with one another in causal ordering. Their current vectors are show below. If Process A sends a message, which processes can receive it immediately? 3. Show that, if the basic multicast that we use in the algorithm of P9 is also FIFO-ordered, then the resultant totally-ordered multicast is also causally ordered. 4. Consider pros and cons of PVM ’ s daemon-based and MPI ’ s library linking-based message passing. 5. Why can MPI maintain FIFO ordering? Process AProcess BProcess CProcess D 3, 5, 2, 12, 5, 2, 13, 5, 2, 13, 4, 2, 1

CSS434 MPI1 CSS434 Group Communication and MPI Textbook Ch4.4-5 and 15.4 Professor: Munehiro Fukuda.

Similar presentations

Presentation on theme: "CSS434 MPI1 CSS434 Group Communication and MPI Textbook Ch4.4-5 and 15.4 Professor: Munehiro Fukuda."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSS434 MPI1 CSS434 Group Communication and MPI Textbook Ch4.4-5 and 15.4 Professor: Munehiro Fukuda.

Similar presentations

Presentation on theme: "CSS434 MPI1 CSS434 Group Communication and MPI Textbook Ch4.4-5 and 15.4 Professor: Munehiro Fukuda."— Presentation transcript:

Similar presentations

About project

Feedback