Porting Charm++ to a New System Writing a Machine Layer Sayantan Chakravorty 5/01/20081Parallel Programming Laboratory.

Slides:



Advertisements
Similar presentations
How to use TinyOS Jason Hill Rob Szewczyk Alec Woo David Culler An event based execution environment for Networked Sensors.
Advertisements

Introduction to Sockets Jan Why do we need sockets? Provides an abstraction for interprocess communication.
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
CS 11 C track: lecture 7 Last week: structs, typedef, linked lists This week: hash tables more on the C preprocessor extern const.
Compilation and Debugging 101. Compilation in C/C++ hello.c Preprocessor Compiler stdio.h tmpXQ.i (C code) hello.o (object file)
MPI Message Passing Interface
Umut Girit  One of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer.
Remote Procedure Call (RPC)
Chapter 7 – Transport Layer Protocols
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Parallel Programming Laboratory1 Fault Tolerance in Charm++ Sayantan Chakravorty.
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
Client Server Model The client machine (or the client process) makes the request for some resource or service, and the server machine (the server process)
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
Gursharan Singh Tatla Transport Layer 16-May
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface KFI Framework.
Switching, routing, and flow control in interconnection networks.
Process-to-Process Delivery:
TRANSPORT LAYER T.Najah Al-Subaie Kingdom of Saudi Arabia Prince Norah bint Abdul Rahman University College of Computer Since and Information System NET331.
NetSim ZigBee Simulation Code Walkthrough in 10 steps
LWIP TCP/IP Stack 김백규.
Internet Addresses. Universal Identifiers Universal Communication Service - Communication system which allows any host to communicate with any other host.
LWIP TCP/IP Stack 김백규.
Introduction to Charm++ Machine Layer Gengbin Zheng Parallel Programming Lab 4/3/2002.
10/16/ Realizing Concurrency using the thread model B. Ramamurthy.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
Charm Workshop CkDirect: Charm++ RDMA Put Presented by Eric Bohm CkDirect Team: Eric Bohm, Sayantan Chakravorty, Pritish Jetley, Abhinav Bhatele.
CS 453 Computer Networks Lecture 9 Layer 2 – Data Link Layer.
Fault Tolerant Extensions to Charm++ and AMPI presented by Sayantan Chakravorty Chao Huang, Celso Mendes, Gengbin Zheng, Lixia Shi.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.
1 Remote Procedure Calls External Data Representation (Ch 19) RPC Concept (Ch 20)
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface Kfabric Framework.
LonWorks Introduction Hwayoung Chae.
Process-to-Process Delivery:
SOCKET PROGRAMMING Presented By : Divya Sharma.
A New Distributed Processing Framework
Introduction to Operating Systems
LWIP TCP/IP Stack 김백규.
CISC105 – General Computer Science
Accelerating Large Charm++ Messages using RDMA
OSI Protocol Stack Given the post man exemple.
TCP Transport layer Er. Vikram Dhiman LPU.
NET323 D: Network Protocols
Net 323: NETWORK Protocols
Subject Name: Computer Communication Networks Subject Code: 10EC71
Threads and Cooperation
Introduction to Operating Systems
Realizing Concurrency using Posix Threads (pthreads)
Switching, routing, and flow control in interconnection networks
Realizing Concurrency using the thread model
Chapter 2: System Structures
NET323 D: Network Protocols
Process-to-Process Delivery:
Memory Allocation CS 217.
Issues in Client/Server Programming
CPEG514 Advanced Computer Networkst
EECE.4810/EECE.5730 Operating Systems
Realizing Concurrency using Posix Threads (pthreads)
Realizing Concurrency using the thread model
Realizing Concurrency using Posix Threads (pthreads)
Process-to-Process Delivery: UDP, TCP
Computer Networks Protocols
Presentation transcript:

Porting Charm++ to a New System Writing a Machine Layer Sayantan Chakravorty 5/01/20081Parallel Programming Laboratory

Why have a Machine Layer ? User Code.ci.C.h User Code.ci.C.h 5/01/20082Parallel Programming Laboratory Charm++ Load balancing Virtualization Scheduler Memory management Message delivery Timers Converse Machine Layer

Where is the Machine Layer ? Code exists in charm/src/arch/ Files needed for a machine layer – machine.c : Contains C code – conv-mach.sh : Defines environment variables – conv-mach.h : Defines macros to choose version of machine.c – Can produce many variants based on the same machine.c by varying conv-mach-.* 132 versions based on only 18 machine.c files 5/01/20083Parallel Programming Laboratory

What all does a Machine Layer do? 5/01/2008Parallel Programming Laboratory4 ConverseInit FrontEnd ConverseInit CmiSyncSendFn CmiSyncBroadcastFn ConverseExit CmiAbort ConverseExit CmiAbort ConverseExit CmiAbort ConverseExit CmiAbort ConverseExit CmiAbort ConverseExit CmiAbort

Different kinds of Machine Layers Differentiate by Startup method – Uses lower level library/ run time MPI: mpirun is the frontend – cray, sol, bluegenep VMI: vmirun is the frontend – amd64, ia64 ELAN: prun is the frontend – axp, ia64 – Charm run time does startup Network based (net) : charmrun is the frontend – amd64, ia64,ppc – Infiniband, Ethernet, Myrinet 5/01/20085Parallel Programming Laboratory

Net Layer: Why ? Why do we need a startup in Charm RTS ? – Using a low level interconnect API, no startup provided Why use low level API ? – Faster » Why faster Lower overheads We can design for a message driven system – More flexible » Why more flexible ? Can implement functionality with exact semantics needed 5/01/20086Parallel Programming Laboratory

Net Layer: What ? Code base for implementing a machine layer on low level interconnect API 5/01/2008Parallel Programming Laboratory7 ConverseInit charmrun CmiSyncSendFn CmiSyncBroadcastFn ConverseExit CmiAbort ConverseExit CmiAbort req_client_connect DeliverViaNetwork CmiMachineInit node_addresses_obtain CmiMachineInit node_addresses_obtain CmiMachineExit CommunicationServer

Net Layer: Startup 5/01/2008Parallel Programming Laboratory8 charmrun.c main(){ // read node file nodetab_init(); //fire off compute node processes start_nodes_rsh(); //Wait for all nodes to reply //Send nodes their node table req_client_connect(); //Poll for requests while (1) req_poll(); } charmrun.c main(){ // read node file nodetab_init(); //fire off compute node processes start_nodes_rsh(); //Wait for all nodes to reply //Send nodes their node table req_client_connect(); //Poll for requests while (1) req_poll(); } machine.c ConverseInit(){ //Open socket with charmrun skt_connect(..); //Initialize the interconnect CmiMachineInit(); //Send my node data //Get the node table node_addresses_obtain(..); //Start the Charm++ user code ConverseRunPE(); } machine.c ConverseInit(){ //Open socket with charmrun skt_connect(..); //Initialize the interconnect CmiMachineInit(); //Send my node data //Get the node table node_addresses_obtain(..); //Start the Charm++ user code ConverseRunPE(); } ssh and start process Node data Node Table

Net Layer: Sending messages 5/01/2008Parallel Programming Laboratory9 CmiSyncSendFn(int proc,int size,char *msg){ //common function for send CmiGeneralSend(proc,size,`S’,msg); } CmiGeneralSend(int proc,int size,int freemode, char *data){ OutgoingMsg ogm = PrepareOutgoing(cs,pe, size,freemode,data); DeliverOutgoingMessage(ogm); //Check for incoming messages and completed //sends CommunicationServer(); } DeliverOutgoingMessage(OutgoingMsg ogm){ //Send the message on the interconnect DeliverViaNetwork(ogm,..); } CmiSyncSendFn(int proc,int size,char *msg){ //common function for send CmiGeneralSend(proc,size,`S’,msg); } CmiGeneralSend(int proc,int size,int freemode, char *data){ OutgoingMsg ogm = PrepareOutgoing(cs,pe, size,freemode,data); DeliverOutgoingMessage(ogm); //Check for incoming messages and completed //sends CommunicationServer(); } DeliverOutgoingMessage(OutgoingMsg ogm){ //Send the message on the interconnect DeliverViaNetwork(ogm,..); }

Net Layer: Exit 5/01/2008Parallel Programming Laboratory10 ConverseExit(){ //Shutdown the interconnect cleanly CmiMachineExit(); //Shutdown Converse ConverseCommonExit(); //Inform charmrun this process is done ctrl_sendone_locking("ending",NULL,0, NULL,0); } ConverseExit(){ //Shutdown the interconnect cleanly CmiMachineExit(); //Shutdown Converse ConverseCommonExit(); //Inform charmrun this process is done ctrl_sendone_locking("ending",NULL,0, NULL,0); }

Net Layer: Receiving Messages No mention of receiving messages Result of message driven paradigm – No explicit Receive calls Receive starts in CommunicationServer – Interconnect specific code collects received message – Calls CmiPushPE to handover message 5/01/2008Parallel Programming Laboratory11

Let’s write a Net based Machine Layer 5/01/2008Parallel Programming Laboratory12

A Simple Interconnect Let’s make up an interconnect – Simple Each node has a port Other Nodes send it messages on that port A node reads its port for incoming messages Messages are received atomically – Reliable – Does Flow control itself 5/01/200813Parallel Programming Laboratory

The Simple Interconnect AMPI Initialization – void si_init() – int si_open() – NodeID si_getid() Send a message – int si_write(NodeID node, int port, int size, char *msg) Receive a message – int si_read(int port, int size, char *buf) Exit – int si_close(int port) – void si_done() 5/01/2008Parallel Programming Laboratory14

Let’s start Net layer based implementation for SI 5/01/200815Parallel Programming Laboratory conv-mach-si.h #undef CMK_USE_SI #define CMK_USE_SI 1 //Polling based net layer #undef CMK_NETPOLL #define CMK_NETPOLL 1 conv-mach-si.h #undef CMK_USE_SI #define CMK_USE_SI 1 //Polling based net layer #undef CMK_NETPOLL #define CMK_NETPOLL 1 conv-mach-si.sh CMK_INCDIR=“-I/opt/si/include” CMK_LIBDIR=“-I/opt/si/lib” CMK_LIB=“$CMK_LIBS –lsi” conv-mach-si.sh CMK_INCDIR=“-I/opt/si/include” CMK_LIBDIR=“-I/opt/si/lib” CMK_LIB=“$CMK_LIBS –lsi”

Net based SI Layer 5/01/2008Parallel Programming Laboratory16 machine-si.c #include “si.h” CmiMachineInit DeliverViaNetwork CommunicationServer CmiMachineExit machine-si.c #include “si.h” CmiMachineInit DeliverViaNetwork CommunicationServer CmiMachineExit machine.c //Message delivery #include “machine-dgram.c” machine.c //Message delivery #include “machine-dgram.c” machine-dgram.c #if CMK_USE_GM #include "machine-gm.c“ #elif CMK_USE_SI #include “machine-si.c” #elif … machine-dgram.c #if CMK_USE_GM #include "machine-gm.c“ #elif CMK_USE_SI #include “machine-si.c” #elif …

Initialization 5/01/2008Parallel Programming Laboratory17 charmrun.c void req_client_connect(){ //collect all node data for(i=0;i<nClients;i++){ ChMessage_recv(req_clients[i],&msg); ChSingleNodeInfo *m=msg->data; #ifdef CMK_USE_SI nodetab[m.PE].nodeID = m.info.nodeID nodetab[m.PE].port = m.info.port #endif } //send node data to all for(i=0;i<nClients;i++){ //send nodetab on req_clients[i] } charmrun.c void req_client_connect(){ //collect all node data for(i=0;i<nClients;i++){ ChMessage_recv(req_clients[i],&msg); ChSingleNodeInfo *m=msg->data; #ifdef CMK_USE_SI nodetab[m.PE].nodeID = m.info.nodeID nodetab[m.PE].port = m.info.port #endif } //send node data to all for(i=0;i<nClients;i++){ //send nodetab on req_clients[i] } machine.c static OtherNode nodes; void node_adress_obtain(){ ChSingleNodeinfo me; #ifdef CMK_USE_SI me.info.nodeID = si_nodeID; me.info.port = si_port; #endif //send node data to chamrun ctrl_sendone_nolock("initnode",&me, sizeof(me),NULL,0); //receive and store node table ChMessage_recv(charmrun_fd, &tab); for(i=0;i<Cmi_num_nodes;i++){ nodes[i].nodeID = tab->data[i].nodeID; nodes[i].port = tab->data[i].port; } machine.c static OtherNode nodes; void node_adress_obtain(){ ChSingleNodeinfo me; #ifdef CMK_USE_SI me.info.nodeID = si_nodeID; me.info.port = si_port; #endif //send node data to chamrun ctrl_sendone_nolock("initnode",&me, sizeof(me),NULL,0); //receive and store node table ChMessage_recv(charmrun_fd, &tab); for(i=0;i<Cmi_num_nodes;i++){ nodes[i].nodeID = tab->data[i].nodeID; nodes[i].port = tab->data[i].port; } machine-si.c NodeID si_nodeID; int si_port; CmiMachineInit(){ si_init(); si_port = si_open(); si_nodeID = si_getid(); } machine-si.c NodeID si_nodeID; int si_port; CmiMachineInit(){ si_init(); si_port = si_open(); si_nodeID = si_getid(); }

Messaging: Design Small header with every message – contains the size of the message – Source NodeID (not strictly necessary) Read the header – Allocate a buffer for incoming message – Read message into buffer – Send it up to Converse 5/01/2008Parallel Programming Laboratory18

Messaging: Code 5/01/2008Parallel Programming Laboratory19 machine-si.c typedef struct{ unsigned int size; NodeID nodeID; } si_header; void DeliverViaNetwork(OutgoingMsg ogm, int dest,…) { DgramHeaderMake(ogm->data,…); si_header hdr; hdr.nodeID = si_nodeID; hdr.size = ogm->size; OtherNode n = nodes[dest]; if(!si_write(n.nodeID, n.port,sizeof(hdr), &hdr) ){} if(!si_write(n.nodeID, n.port, hdr.size, ogm->data) ){} } machine-si.c typedef struct{ unsigned int size; NodeID nodeID; } si_header; void DeliverViaNetwork(OutgoingMsg ogm, int dest,…) { DgramHeaderMake(ogm->data,…); si_header hdr; hdr.nodeID = si_nodeID; hdr.size = ogm->size; OtherNode n = nodes[dest]; if(!si_write(n.nodeID, n.port,sizeof(hdr), &hdr) ){} if(!si_write(n.nodeID, n.port, hdr.size, ogm->data) ){} } machine-si.c void CommunicationServer(){ si_header hdr; while(si_read(si_port,sizeof(hdr),&hdr)!= 0) { void *buf = CmiAlloc(hdr.size); int readSize,readTotal=0; while(readTotal < hdr.siez){ if((readSize= si_read(si_port,hdr.size,buf) ) <0){} readTotal += readSize; } //handover to Converse } machine-si.c void CommunicationServer(){ si_header hdr; while(si_read(si_port,sizeof(hdr),&hdr)!= 0) { void *buf = CmiAlloc(hdr.size); int readSize,readTotal=0; while(readTotal < hdr.siez){ if((readSize= si_read(si_port,hdr.size,buf) ) <0){} readTotal += readSize; } //handover to Converse }

Exit 5/01/2008Parallel Programming Laboratory20 machine-si.c NodeID si_nodeID; int si_port; CmiMachineExit (){ si_close(si_port); si_done(); } machine-si.c NodeID si_nodeID; int si_port; CmiMachineExit (){ si_close(si_port); si_done(); }

More complex Layers Receive buffers need to be posted – Packetization Unreliable interconnect – Error and Drop detection – Packetization – Retransmission Interconnect requires memory to be registered – CmiAlloc implementation 5/01/2008Parallel Programming Laboratory21

Thank You 5/01/2008Parallel Programming Laboratory22