Presentation is loading. Please wait.

Presentation is loading. Please wait.

Porting Charm++ to a New System Writing a Machine Layer Sayantan Chakravorty 5/01/20081Parallel Programming Laboratory.

Similar presentations


Presentation on theme: "Porting Charm++ to a New System Writing a Machine Layer Sayantan Chakravorty 5/01/20081Parallel Programming Laboratory."— Presentation transcript:

1 Porting Charm++ to a New System Writing a Machine Layer Sayantan Chakravorty 5/01/20081Parallel Programming Laboratory

2 Why have a Machine Layer ? User Code.ci.C.h User Code.ci.C.h 5/01/20082Parallel Programming Laboratory Charm++ Load balancing Virtualization Scheduler Memory management Message delivery Timers Converse Machine Layer

3 Where is the Machine Layer ? Code exists in charm/src/arch/ Files needed for a machine layer – machine.c : Contains C code – conv-mach.sh : Defines environment variables – conv-mach.h : Defines macros to choose version of machine.c – Can produce many variants based on the same machine.c by varying conv-mach-.* 132 versions based on only 18 machine.c files 5/01/20083Parallel Programming Laboratory

4 What all does a Machine Layer do? 5/01/2008Parallel Programming Laboratory4 ConverseInit FrontEnd ConverseInit CmiSyncSendFn CmiSyncBroadcastFn ConverseExit CmiAbort ConverseExit CmiAbort ConverseExit CmiAbort ConverseExit CmiAbort ConverseExit CmiAbort ConverseExit CmiAbort

5 Different kinds of Machine Layers Differentiate by Startup method – Uses lower level library/ run time MPI: mpirun is the frontend – cray, sol, bluegenep VMI: vmirun is the frontend – amd64, ia64 ELAN: prun is the frontend – axp, ia64 – Charm run time does startup Network based (net) : charmrun is the frontend – amd64, ia64,ppc – Infiniband, Ethernet, Myrinet 5/01/20085Parallel Programming Laboratory

6 Net Layer: Why ? Why do we need a startup in Charm RTS ? – Using a low level interconnect API, no startup provided Why use low level API ? – Faster » Why faster Lower overheads We can design for a message driven system – More flexible » Why more flexible ? Can implement functionality with exact semantics needed 5/01/20086Parallel Programming Laboratory

7 Net Layer: What ? Code base for implementing a machine layer on low level interconnect API 5/01/2008Parallel Programming Laboratory7 ConverseInit charmrun CmiSyncSendFn CmiSyncBroadcastFn ConverseExit CmiAbort ConverseExit CmiAbort req_client_connect DeliverViaNetwork CmiMachineInit node_addresses_obtain CmiMachineInit node_addresses_obtain CmiMachineExit CommunicationServer

8 Net Layer: Startup 5/01/2008Parallel Programming Laboratory8 charmrun.c main(){ // read node file nodetab_init(); //fire off compute node processes start_nodes_rsh(); //Wait for all nodes to reply //Send nodes their node table req_client_connect(); //Poll for requests while (1) req_poll(); } charmrun.c main(){ // read node file nodetab_init(); //fire off compute node processes start_nodes_rsh(); //Wait for all nodes to reply //Send nodes their node table req_client_connect(); //Poll for requests while (1) req_poll(); } machine.c ConverseInit(){ //Open socket with charmrun skt_connect(..); //Initialize the interconnect CmiMachineInit(); //Send my node data //Get the node table node_addresses_obtain(..); //Start the Charm++ user code ConverseRunPE(); } machine.c ConverseInit(){ //Open socket with charmrun skt_connect(..); //Initialize the interconnect CmiMachineInit(); //Send my node data //Get the node table node_addresses_obtain(..); //Start the Charm++ user code ConverseRunPE(); } ssh and start process Node data Node Table

9 Net Layer: Sending messages 5/01/2008Parallel Programming Laboratory9 CmiSyncSendFn(int proc,int size,char *msg){ //common function for send CmiGeneralSend(proc,size,`S’,msg); } CmiGeneralSend(int proc,int size,int freemode, char *data){ OutgoingMsg ogm = PrepareOutgoing(cs,pe, size,freemode,data); DeliverOutgoingMessage(ogm); //Check for incoming messages and completed //sends CommunicationServer(); } DeliverOutgoingMessage(OutgoingMsg ogm){ //Send the message on the interconnect DeliverViaNetwork(ogm,..); } CmiSyncSendFn(int proc,int size,char *msg){ //common function for send CmiGeneralSend(proc,size,`S’,msg); } CmiGeneralSend(int proc,int size,int freemode, char *data){ OutgoingMsg ogm = PrepareOutgoing(cs,pe, size,freemode,data); DeliverOutgoingMessage(ogm); //Check for incoming messages and completed //sends CommunicationServer(); } DeliverOutgoingMessage(OutgoingMsg ogm){ //Send the message on the interconnect DeliverViaNetwork(ogm,..); }

10 Net Layer: Exit 5/01/2008Parallel Programming Laboratory10 ConverseExit(){ //Shutdown the interconnect cleanly CmiMachineExit(); //Shutdown Converse ConverseCommonExit(); //Inform charmrun this process is done ctrl_sendone_locking("ending",NULL,0, NULL,0); } ConverseExit(){ //Shutdown the interconnect cleanly CmiMachineExit(); //Shutdown Converse ConverseCommonExit(); //Inform charmrun this process is done ctrl_sendone_locking("ending",NULL,0, NULL,0); }

11 Net Layer: Receiving Messages No mention of receiving messages Result of message driven paradigm – No explicit Receive calls Receive starts in CommunicationServer – Interconnect specific code collects received message – Calls CmiPushPE to handover message 5/01/2008Parallel Programming Laboratory11

12 Let’s write a Net based Machine Layer 5/01/2008Parallel Programming Laboratory12

13 A Simple Interconnect Let’s make up an interconnect – Simple Each node has a port Other Nodes send it messages on that port A node reads its port for incoming messages Messages are received atomically – Reliable – Does Flow control itself 5/01/200813Parallel Programming Laboratory

14 The Simple Interconnect AMPI Initialization – void si_init() – int si_open() – NodeID si_getid() Send a message – int si_write(NodeID node, int port, int size, char *msg) Receive a message – int si_read(int port, int size, char *buf) Exit – int si_close(int port) – void si_done() 5/01/2008Parallel Programming Laboratory14

15 Let’s start Net layer based implementation for SI 5/01/200815Parallel Programming Laboratory conv-mach-si.h #undef CMK_USE_SI #define CMK_USE_SI 1 //Polling based net layer #undef CMK_NETPOLL #define CMK_NETPOLL 1 conv-mach-si.h #undef CMK_USE_SI #define CMK_USE_SI 1 //Polling based net layer #undef CMK_NETPOLL #define CMK_NETPOLL 1 conv-mach-si.sh CMK_INCDIR=“-I/opt/si/include” CMK_LIBDIR=“-I/opt/si/lib” CMK_LIB=“$CMK_LIBS –lsi” conv-mach-si.sh CMK_INCDIR=“-I/opt/si/include” CMK_LIBDIR=“-I/opt/si/lib” CMK_LIB=“$CMK_LIBS –lsi”

16 Net based SI Layer 5/01/2008Parallel Programming Laboratory16 machine-si.c #include “si.h” CmiMachineInit DeliverViaNetwork CommunicationServer CmiMachineExit machine-si.c #include “si.h” CmiMachineInit DeliverViaNetwork CommunicationServer CmiMachineExit machine.c //Message delivery #include “machine-dgram.c” machine.c //Message delivery #include “machine-dgram.c” machine-dgram.c #if CMK_USE_GM #include "machine-gm.c“ #elif CMK_USE_SI #include “machine-si.c” #elif … machine-dgram.c #if CMK_USE_GM #include "machine-gm.c“ #elif CMK_USE_SI #include “machine-si.c” #elif …

17 Initialization 5/01/2008Parallel Programming Laboratory17 charmrun.c void req_client_connect(){ //collect all node data for(i=0;i<nClients;i++){ ChMessage_recv(req_clients[i],&msg); ChSingleNodeInfo *m=msg->data; #ifdef CMK_USE_SI nodetab[m.PE].nodeID = m.info.nodeID nodetab[m.PE].port = m.info.port #endif } //send node data to all for(i=0;i<nClients;i++){ //send nodetab on req_clients[i] } charmrun.c void req_client_connect(){ //collect all node data for(i=0;i<nClients;i++){ ChMessage_recv(req_clients[i],&msg); ChSingleNodeInfo *m=msg->data; #ifdef CMK_USE_SI nodetab[m.PE].nodeID = m.info.nodeID nodetab[m.PE].port = m.info.port #endif } //send node data to all for(i=0;i<nClients;i++){ //send nodetab on req_clients[i] } machine.c static OtherNode nodes; void node_adress_obtain(){ ChSingleNodeinfo me; #ifdef CMK_USE_SI me.info.nodeID = si_nodeID; me.info.port = si_port; #endif //send node data to chamrun ctrl_sendone_nolock("initnode",&me, sizeof(me),NULL,0); //receive and store node table ChMessage_recv(charmrun_fd, &tab); for(i=0;i<Cmi_num_nodes;i++){ nodes[i].nodeID = tab->data[i].nodeID; nodes[i].port = tab->data[i].port; } machine.c static OtherNode nodes; void node_adress_obtain(){ ChSingleNodeinfo me; #ifdef CMK_USE_SI me.info.nodeID = si_nodeID; me.info.port = si_port; #endif //send node data to chamrun ctrl_sendone_nolock("initnode",&me, sizeof(me),NULL,0); //receive and store node table ChMessage_recv(charmrun_fd, &tab); for(i=0;i<Cmi_num_nodes;i++){ nodes[i].nodeID = tab->data[i].nodeID; nodes[i].port = tab->data[i].port; } machine-si.c NodeID si_nodeID; int si_port; CmiMachineInit(){ si_init(); si_port = si_open(); si_nodeID = si_getid(); } machine-si.c NodeID si_nodeID; int si_port; CmiMachineInit(){ si_init(); si_port = si_open(); si_nodeID = si_getid(); }

18 Messaging: Design Small header with every message – contains the size of the message – Source NodeID (not strictly necessary) Read the header – Allocate a buffer for incoming message – Read message into buffer – Send it up to Converse 5/01/2008Parallel Programming Laboratory18

19 Messaging: Code 5/01/2008Parallel Programming Laboratory19 machine-si.c typedef struct{ unsigned int size; NodeID nodeID; } si_header; void DeliverViaNetwork(OutgoingMsg ogm, int dest,…) { DgramHeaderMake(ogm->data,…); si_header hdr; hdr.nodeID = si_nodeID; hdr.size = ogm->size; OtherNode n = nodes[dest]; if(!si_write(n.nodeID, n.port,sizeof(hdr), &hdr) ){} if(!si_write(n.nodeID, n.port, hdr.size, ogm->data) ){} } machine-si.c typedef struct{ unsigned int size; NodeID nodeID; } si_header; void DeliverViaNetwork(OutgoingMsg ogm, int dest,…) { DgramHeaderMake(ogm->data,…); si_header hdr; hdr.nodeID = si_nodeID; hdr.size = ogm->size; OtherNode n = nodes[dest]; if(!si_write(n.nodeID, n.port,sizeof(hdr), &hdr) ){} if(!si_write(n.nodeID, n.port, hdr.size, ogm->data) ){} } machine-si.c void CommunicationServer(){ si_header hdr; while(si_read(si_port,sizeof(hdr),&hdr)!= 0) { void *buf = CmiAlloc(hdr.size); int readSize,readTotal=0; while(readTotal < hdr.siez){ if((readSize= si_read(si_port,hdr.size,buf) ) <0){} readTotal += readSize; } //handover to Converse } machine-si.c void CommunicationServer(){ si_header hdr; while(si_read(si_port,sizeof(hdr),&hdr)!= 0) { void *buf = CmiAlloc(hdr.size); int readSize,readTotal=0; while(readTotal < hdr.siez){ if((readSize= si_read(si_port,hdr.size,buf) ) <0){} readTotal += readSize; } //handover to Converse }

20 Exit 5/01/2008Parallel Programming Laboratory20 machine-si.c NodeID si_nodeID; int si_port; CmiMachineExit (){ si_close(si_port); si_done(); } machine-si.c NodeID si_nodeID; int si_port; CmiMachineExit (){ si_close(si_port); si_done(); }

21 More complex Layers Receive buffers need to be posted – Packetization Unreliable interconnect – Error and Drop detection – Packetization – Retransmission Interconnect requires memory to be registered – CmiAlloc implementation 5/01/2008Parallel Programming Laboratory21

22 Thank You 5/01/2008Parallel Programming Laboratory22


Download ppt "Porting Charm++ to a New System Writing a Machine Layer Sayantan Chakravorty 5/01/20081Parallel Programming Laboratory."

Similar presentations


Ads by Google