Jon P. Maloy TIPC: Communication for Linux Clusters.

Slides:



Advertisements
Similar presentations
Umut Girit  One of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer.
Advertisements

CCNA – Network Fundamentals
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 OSI Transport Layer Network Fundamentals – Chapter 4.
Transmission Control Protocol (TCP)
Intermediate TCP/IP TCP Operation.
Guide to TCP/IP, Third Edition
UDP & TCP Where would we be without them!. UDP User Datagram Protocol.
Transport Layer – TCP (Part1) Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF.
Chapter 7 – Transport Layer Protocols
TELE202 Lecture 14 TCP/UDP (2) 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »TCP/UDP (1) »Source: chapter 17 ¥This Lecture »TCP/UDP (2) »Source: chapter.
UNIT-IV Computer Network Network Layer. Network Layer Prepared by - ROHIT KOSHTA In the seven-layer OSI model of computer networking, the network layer.
Client Server Model The client machine (or the client process) makes the request for some resource or service, and the server machine (the server process)
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Transport Protocols Slide 1 Transport Protocols.
TCP. Learning objectives Reliable Transport in TCP TCP flow and Congestion Control.
WXES2106 Network Technology Semester /2005 Chapter 8 Intermediate TCP CCNA2: Module 10.
Gursharan Singh Tatla Transport Layer 16-May
Process-to-Process Delivery:
TRANSPORT LAYER T.Najah Al-Subaie Kingdom of Saudi Arabia Prince Norah bint Abdul Rahman University College of Computer Since and Information System NET331.
1 Transport Layer Computer Networks. 2 Where are we?
University of Calgary – CPSC 441.  UDP stands for User Datagram Protocol.  A protocol for the Transport Layer in the protocol Stack.  Alternative to.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
1 Semester 2 Module 10 Intermediate TCP/IP Yuda college of business James Chen
1 Chapter 1 OSI Architecture The OSI 7-layer Model OSI – Open Systems Interconnection.
Jon Maloy, Ericsson Steven Blake, Ericsson Maarten Koning, WindRiver draft-maloy-tipc-00.txt Transparent Inter Process Communication TIPC.
1 7-Oct-15 OSI transport layer CCNA Exploration Semester 1 Chapter 4.
Introduction to Networks CS587x Lecture 1 Department of Computer Science Iowa State University.
1 LAN Protocols (Week 3, Wednesday 9/10/2003) © Abdou Illia, Fall 2003.
TCP : Transmission Control Protocol Computer Network System Sirak Kaewjamnong.
TCP Lecture 13 November 13, TCP Background Transmission Control Protocol (TCP) TCP provides much of the functionality that IP lacks: reliable service.
University of the Western Cape Chapter 12: The Transport Layer.
Section 5: The Transport Layer. 5.2 CS Computer Networks John Mc Donald, Dept. of Computer Science, NUI Maynooth. Introduction In the previous section.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 2.5 Internetworking Chapter 25 (Transport Protocols, UDP and TCP, Protocol Port Numbers)
Copyright 2002, S.D. Personick. All Rights Reserved.1 Telecommunications Networking II Topic 20 Transmission Control Protocol (TCP) Ref: Tanenbaum pp:
TCP1 Transmission Control Protocol (TCP). TCP2 Outline Transmission Control Protocol.
Transport Control Protocol (TCP) Features of TCP, packet loss and retransmission, adaptive retransmission, flow control, three way handshake, congestion.
CCNA 1 v3.0 Module 11 TCP/IP Transport and Application Layers.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Chapter 2 Applications and Layered Architectures Sockets.
CPSC 441 TUTORIAL – FEB 13, 2012 TA: RUITNG ZHOU UDP REVIEW.
Interfaces and Services Each layer provides a service to the layer above it. A service is a set of primitive operations. Under UNIX, primitives are implemented.
Advanced UNIX programming Fall 2002, lecture 16 Instructor: Ashok Srinivasan Acknowledgements: The syllabus and power point presentations are modified.
Transmission Control Protocol (TCP) BSAD 146 Dave Novak Sources: Network+ Guide to Networks, Dean 2013.
By Alex Audu Forces-PL Design Criteria. NOKIA RESEARCH CENTER / BOSTON NE (Network Element) WITH STATE NE (Network Element) WITH STATE  Importance of.
Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx Hormuzd Khosravi,Intel draft-maloy-tipc-01.txt TIPC as TML.
1 Transport Layer: Basics Outline Intro to transport UDP Congestion control basics.
4343 X2 – The Transport Layer Tanenbaum Ch.6.
1 TIPC based TML for ForCES Protocol Jon Maloy Shuchi Chawla Hormuzd Khosravi Furquan Ansari Jamal Hadi Salim 63 rd IETF Meeting, Paris.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
© 2002, Cisco Systems, Inc. All rights reserved..
Data Communications and Networks Chapter 6 – IP, UDP and TCP ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Ch 3. Transport Layer Myungchul Kim
Process-to-Process Delivery:
Ch 3. Transport Layer Myungchul Kim
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 OSI transport layer CCNA Exploration Semester 1 – Chapter 4.
3. END-TO-END PROTOCOLS (PART 1) Rocky K. C. Chang Department of Computing The Hong Kong Polytechnic University 22 March
1 Chapter 24 Internetworking Part 4 (Transport Protocols, UDP and TCP, Protocol Port Numbers)
The Transport Layer Implementation Services Functions Protocols
Chapter 9: Transport Layer
Instructor Materials Chapter 9: Transport Layer
Transport Layer.
Process-to-Process Delivery, TCP and UDP protocols
PART 5 Transport Layer Computer Networks.
Transport Layer Unit 5.
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Process-to-Process Delivery:
CPEG514 Advanced Computer Networkst
Process-to-Process Delivery: UDP, TCP
Transport Layer 9/22/2019.
Presentation transcript:

Jon P. Maloy TIPC: Communication for Linux Clusters

NOKIA RESEARCH CENTER / BOSTON  ForCES  Efficient communication CE-FE, TIPC used as TML  IETF drafts draft-maloy-tipc-01.txt draft-maloy-tipc-tml-00.txt  State Synchronization across nodes  E.g. Connection Tracking migration Reliable Multicast support Tight link supervision  Efficient Clustering of Network Devices  Has been used in Ericsson products for 8 years Proven in the field TIPC Motivation

NOKIA RESEARCH CENTER / BOSTON TIPC Transparent Inter Process Communication  A transport protocol specialized for single node and cluster environments  “Cluster global Unix sockets” with structured addressing scheme  Supports both connection oriented and connectionless communication  Reliable and non-reliable multicast  A framework for detecting, supervising and maintaining cluster topology  Source code available from SourceForge under dual BSD/GPL licence  Not intrusive; small; no kernel changes required  Code re-work ongoing to streamline for Linux  Adopted by several OS:es in telecom industry already  More to come

NOKIA RESEARCH CENTER / BOSTON  TCP/SCTP  Too generic for efficient local communication, only connection oriented  UDP  Unreliable, no congestion control  Unix Sockets  Only single node, only connection oriented  What We Wanted  One communication service with the speed of UDP/UNIX sockets, the reliability of TCP, and the versatility of them all combined  Functional addressing  Extend address location transparency beyond the local node  Have failure detection times at millisecond level, at least  A way to know when addresses becomes available/unavailable Why Another Protocol ?

NOKIA RESEARCH CENTER / BOSTON  Addressing Location Transparency  Powerful functional addressing scheme  The cluster can be seen as one single node  In all three communication modes  Selective transparency  Lightweight, Reactive Connections  Immediate connection abortion at node/process failure or overload  Performance  Directly on media (Ethernet,RapidIO...) when possible, otherwise on IP  24 byte header for most messages  Numbers (slightly dated) 80 % faster than loopback TCP 35 % faster than inter-node TCP for short messages What We Got

NOKIA RESEARCH CENTER / BOSTON  Congestion control at three levels  Connection level, signalling link level and media level  Based on 4 importance priorities  Simple to configure  No configuration needed at all in single node mode  Must set each node’s identity for cluster mode operation, that is all  Automatic neighbour detection using multicast/broadcast  Topology Subscription Service  Functional and physical topology And More…

NOKIA RESEARCH CENTER / BOSTON  Network Redundancy  Can set each interface (“network plane”) as active or standby  Can have up to 3 standby networks for one active  Networks need not be same type  Network Load Sharing  Can set two interfaces active and two standby  Neighbour Supervision  “Lean” heartbeat scheme between nodes  Node failure detected within 500 ms, carrier failure detected immediately  Scalability  Can handle clusters up to hundreds of nodes And More…

NOKIA RESEARCH CENTER / BOSTON TCP Shared Memory EthernetSCTPDCCP Bearer Adapter API Sequence/Retransmission Control Packet Bundling Congestion Control Fragmentation/De-fragmentationReliable Multicast Neighbour Detection Link Establish/Supervision/Failover Address Table Distribution Connection Supervision Route/Link Selection Address SubscriptionAddress Resolution User Adapter API Socket API Adapter Port API Adapter Custom API Adapters Node Internal Functional View

NOKIA RESEARCH CENTER / BOSTON Zone Node Internet/ Intranet Slave Node Network Topology* Cluster * Only Single Cluster communication supported in current implementation

NOKIA RESEARCH CENTER / BOSTON Server Process, Partition B Server Process, Partition A Client Process bind(type = foo, lower=0, upper=99) sendto(type = foo, instance = 33) bind(type = foo, lower=100, upper=199) Functional Addressing: Unicast  Function Address  Persistent, reusable 64 bit port identifier assigned by user Consists of 32 bit function type number and 32 bit instance number  Function Address Sequence (“Partition”)  Range of function addresses of same function type Consists of function type,lower bound,upper bound foo,33

NOKIA RESEARCH CENTER / BOSTON Unicast Code Example // client.c #define FOO 4711 #define INSTANCE 33 int main(int argc, char* argv[], char* dummy[]) { struct sockaddr_tipc srv_addr; int sd = socket (AF_TIPC, SOCK_RDM,0); srv_addr.addrtype = TIPC_ADDR_NAME; srv_addr.addr.name.name.type = FOO; srv_addr.addr.name.name.instance = INSTANCE; srv_addr.addr.name.domain = 0; printf("** TIPC client program started **\n\n"); wait_for_server(&srv_addr.addr.name.name, 10000); /* Send connectionless "hello" message: */ char buf[40] = {"Hello World"}; if (0 > sendto(sd,buf,strlen(buf)+1,0, (struct sockaddr*)&srv_addr, sizeof(srv_addr))){ perror("Client: Failed to send"); exit(1); } /* Receive the acknowledge */ if (0 >= recv(sd,buf,sizeof(buf), 0)){ perror("Unexepected response"); exit(1); } printf("Client: Received response: %s \n",buf); printf("\n*** TIPC client program finished ***\n"); } //server.c #define FOO 4711 #define LOWER_BOUND 0 #define UPPER_BOUND 99 int main(int argc, char* argv[], char* dummy[]) { int sd = socket (AF_TIPC, SOCK_RDM,0); struct sockaddr_tipc partition_addr, client_addr; int alen = sizeof(client_addr); char inbuf[40],outbuf[40] = "Uh ?"; partition_addr.family = AF_TIPC; partition_addr.addrtype = TIPC_ADDR_NAMESEQ; partition_addr.addr.nameseq.type = FOO; partition_addr.addr.nameseq.lower = LOWER_BOUND; partition_addr.addr.nameseq.upper = UPPER_BOUND; partition_addr.scope = TIPC_CLUSTER_SCOPE; printf("** TIPC server program started **\n"); /* Make server available: */ if (0 != bind (sd, (struct sockaddr*)&partition_addr, sizeof(partition_addr))){ printf ("Server: Failed to bind\n"); exit (1); } if (0 >= recvfrom(sd,inbuf,sizeof(inbuf), 0, (struct sockaddr*)&client_addr, &alen)){ perror("Unexepected recv: "); } printf("Server: Message received: %s !\n", inbuf); if (0 > sendto(sd,outbuf,strlen(outbuf)+1,0, (struct sockaddr*)&client_addr, sizeof(client_addr))){ perror("Server: Failed to send"); } printf("\n** TIPC server program finished **\n"); }

NOKIA RESEARCH CENTER / BOSTON Unicast Code Example / /server.c #define FOO 4711 #define LOWER_BOUND 0 #define UPPER_BOUND 99 int main(int argc, char* argv[], char* dummy[]) { int sd = socket (AF_TIPC, SOCK_RDM,0); struct sockaddr_tipc partition_addr, client_addr; int alen = sizeof(client_addr); char inbuf[40],outbuf[40] = "Uh ?"; partition_addr.family = AF_TIPC; partition_addr.addrtype = TIPC_ADDR_NAMESEQ; partition_addr.addr.nameseq.type = FOO; partition_addr.addr.nameseq.lower = LOWER_BOUND; partition_addr.addr.nameseq.upper = UPPER_BOUND; partition_addr.scope = TIPC_CLUSTER_SCOPE; printf("** TIPC server program started **\n"); if (0 != bind (sd, (struct sockaddr*)&partition_addr,sizeof(partition_addr))){ printf ("Server: Failed to bind\n"); exit (1); } if (0 >= recvfrom(sd,inbuf,sizeof(inbuf), 0,(struct sockaddr*)&client_addr,&alen)){ perror("Unexepected recv: "); exit(1); } printf("Server: Message received: %s !\n", inbuf); if (0 > sendto(sd,outbuf,strlen(outbuf)+1,0,(struct sockaddr*)&client_addr,sizeof(client_addr))){ perror("Server: Failed to send"); } printf("\n** TIPC server program finished **\n"); }

NOKIA RESEARCH CENTER / BOSTON Unicast Code Example / / client.c #define FOO 4711 #define INSTANCE 33 int main(int argc, char* argv[], char* dummy[]) { char buf[40] = {"Hello World"}; struct sockaddr_tipc srv_addr; int sd = socket (AF_TIPC, SOCK_RDM,0); srv_addr.addrtype = TIPC_ADDR_NAME; srv_addr.addr.name.name.type = FOO; srv_addr.addr.name.name.instance = INSTANCE; srv_addr.addr.name.domain = 0; printf("** TIPC client program started **\n\n"); wait_for_server(&srv_addr.addr.name.name,10000); if (0 > sendto(sd,buf,strlen(buf)+1,0,(structsockaddr*)&srv_addr,sizeof(srv_addr))){ perror("Client: Failed to send"); exit(1); } if (0 >= recv(sd,buf,sizeof(buf), 0)){ perror("Unexepected response"); exit(1); } printf("Client: Received response: %s \n",buf); printf("** TIPC client program finished **\n\n"); }

NOKIA RESEARCH CENTER / BOSTON Server Process, Partition B Server Process, Partition A Client Process bind(type = foo, lower=0, upper=99) sendto(type = foo, lower = 33, upper = 133) bind(type = foo, lower=100, upper=199) foo,33,133 Functional Addressing: Multicast  Based on Function Address Sequences  Any partition overlapping with the range used in the destination address will receive a copy of the message  Client defines “multicast group” per call

NOKIA RESEARCH CENTER / BOSTON Multicast Code Example // client.c #define FOO 4711 #define LOWER_BOUND 33 #define UPPER_BOUND 133 int main(int argc, char* argv[], char* dummy[]) { struct sockaddr_tipc mcast_group; int sd = socket (AF_TIPC, SOCK_RDM,0); mcast_group.addrtype = TIPC_ADDR_NAMESEQ; mcast_group.addr.name.name.type = FOO; mcast_group.addr.nameseq.lower = LOWER_BOUND; mcast_group.addr.nameseq.upper = UPPER_BOUND; printf("** TIPC client program started **\n\n"); wait_for_server(&mcast_group.addr.name.name, 10000); /* Send connectionless "hello" message: */ char buf[40] = {"Hello World"}; if (0 > sendto(sd,buf,strlen(buf)+1,0, (struct sockaddr*)&mcast_group, sizeof(mcast_group))){ perror("Client: Failed to send"); exit(1); } /* Receive one acknowledge */ if (0 >= recv(sd,buf,sizeof(buf), 0)){ perror("Unexepected response"); exit(1); } printf("Client: Received response: %s \n",buf); printf("\n****** TIPC client program finished ******\n"); } //server.c #define FOO 4711 #define LOWER_BOUND 0 #define UPPER_BOUND 99 int main(int argc, char* argv[], char* dummy[]) { int sd = socket (AF_TIPC, SOCK_RDM,0); struct sockaddr_tipc partition_addr, client_addr; int alen = sizeof(client_addr); char inbuf[40],outbuf[40] = "Uh ?"; partition_addr.family = AF_TIPC; partition_addr.addrtype = TIPC_ADDR_NAMESEQ; partition_addr.addr.nameseq.type = FOO; partition_addr.addr.nameseq.lower = LOWER_BOUND; partition_addr.addr.nameseq.upper = UPPER_BOUND; partition_addr.scope = TIPC_CLUSTER_SCOPE; printf("** TIPC server program started **\n"); /* Make server available: */ if (0 != bind (sd, (struct sockaddr*)&partition_addr, sizeof(partition_addr))){ printf ("Server: Failed to bind\n"); exit (1); } if (0 >= recvfrom(sd,inbuf,sizeof(inbuf), 0, (struct sockaddr*)&client_addr, &alen)){ perror("Unexepected recv: "); } printf("Server: Message received: %s !\n", inbuf); if (0 > sendto(sd,outbuf,strlen(outbuf)+1,0, (struct sockaddr*)&client_addr, sizeof(client_addr))){ perror("Server: Failed to send"); } printf("\n** TIPC server program finished **\n"); }

NOKIA RESEARCH CENTER / BOSTON Multicast Code Example / / client.c #define FOO 4711 #define LOWER_BOUND 33 #define UPPER_BOUND 133 int main(int argc, char* argv[], char* dummy[]) { char buf[40] = {"Hello World"}; struct sockaddr_tipc mcast_group; int sd = socket (AF_TIPC, SOCK_RDM,0); mcast_group.addrtype = TIPC_ADDR_NAMESEQ; mcast_group.addr.name.name.type = FOO; mcast_group.addr.nameseq.lower = LOWER_BOUND; mcast_group.addr.nameseq.upper = UPPER_BOUND; printf("** TIPC client program started **\n\n"); wait_for_server(&mcast_group.addr.name.name,10000); if (0 > sendto(sd,buf,strlen(buf)+1,0,(struct sockaddr*)&mcast_group,sizeof(mcast_group))){ perror("Client: Failed to send"); exit(1); } /* Receive first acknowledge */ if (0 >= recv(sd,buf,sizeof(buf), 0)){ perror("Unexepected response"); exit(1); } printf("Client: Received response: %s \n",buf); printf("\n****** TIPC client program finished ******\n"); }

NOKIA RESEARCH CENTER / BOSTON  Location of server not known by client  Lookup of physical destination performed on-the-fly  Efficient, no secondary messaging involved Client Process sendto(type = foo, lower = 33, upper = 133) Node Server Process, Partition B Server Process, Partition A bind(type = foo, lower=0, upper=99) bind(type = foo, lower=100, upper=199) foo,33,133 Address Location Transparency

NOKIA RESEARCH CENTER / BOSTON  Location of server not known by client  Lookup of physical destination performed on-the-fly  Efficient, no secondary messaging involved Client Process sendto(type = foo, lower = 33, upper = 133) Node Server Process, Partition B Server Process, Partition A bind(type = foo, lower=0, upper=99) bind(type = foo, lower=100, upper=199) foo,33,133 Address Location Transparency Node

NOKIA RESEARCH CENTER / BOSTON Node bind(type = foo, lower=100, upper=199) Node  Location of server not known by client  Lookup of physical destination performed on-the-fly  Efficient, no secondary messaging involved Client Process sendto(type = foo, lower = 33, upper = 133) Node Server Process, Partition B Server Process, Partition A bind(type = foo, lower=0, upper=99) foo,33,133 Address Location Transparency

NOKIA RESEARCH CENTER / BOSTON  Many sockets may bind to same partition  Closest-First or Round-Robin algorithm chosen by client bind(type = foo, lower=0, upper=99) Client Process sendto(type = foo, lower = 33, upper = 133) Server Process, Partition A’ Server Process, Partition A bind(type = foo, lower=0, upper=99) foo,33,133 Address Binding

NOKIA RESEARCH CENTER / BOSTON  Many sockets may bind to same partition  Closest-First or Round-Robin algorithm chosen by client  Same socket may bind to many partitions bind(type = foo, lower=100, upper=199) Client Process sendto(type = foo, lower = 33, upper = 133) Server Process, Partition B Server Process, Partition A+B’ bind(type = foo, lower=0, upper=99) bind(type=foo, lower=100, upper=199) foo,33,133 Address Binding

NOKIA RESEARCH CENTER / BOSTON  Many sockets may bind to same partition  Closest-First or Round-Robin algorithm chosen by client  Same socket may bind to many partitions  Same socket may bind to different functions bind(type = foo, lower=100, upper=199) Client Process sendto(type = foo, lower = 33, upper = 133) Server Process, Partition B Server Process, Partition A bind(type = foo, lower=0, upper=99) bind(type=bar, lower=0, upper=999) foo,33,133 Address Binding

NOKIA RESEARCH CENTER / BOSTON Server Process, Partition B Server Process, Partition A Client Process bind(type = foo, lower=0, upper=99) subscribe(type = foo, lower = 0, upper = 500) bind(type = foo, lower=100, upper=199) foo,0,99 Functional Topology Subscription  Function Address/Address Partition bind/unbind events foo,100,199

NOKIA RESEARCH CENTER / BOSTON TIPC bind(type = node, lower=0x , upper=0x ) Node Client Process subscribe(type = node, lower = 0x , upper = 0x ) node,0x node,0x Node bind(type = node, lower=0x , upper=0x ) TIPC Network Topology Subscription  Node/Cluster/Zone availability events  Same mechanism as for functional events

NOKIA RESEARCH CENTER / BOSTON Connections  Establishment based on functional addressing  Selectable lookup algorithm, partitioning, redundancy etc  Lightweight  End-to-end flow control  SOCK_STREAM/SOCK_SEQPACKET in connection oriented mode  Mutually compatible

NOKIA RESEARCH CENTER / BOSTON Connection Setup foo,117 Server Process, Partition B Client Process sendto(type = foo, instance = 117 )  No protocol messages exchanged during setup/shutdown  Only payload carrying messages

NOKIA RESEARCH CENTER / BOSTON Connection Setup  No protocol messages exchanged during setup/shutdown  Only payload carrying messages Server Process, Partition B Client Process lconnect(client) send()

NOKIA RESEARCH CENTER / BOSTON Connection Setup  No protocol messages exchanged during setup/shutdown  Only payload carrying messages Server Process, Partition B Client Process lconnect(server)

NOKIA RESEARCH CENTER / BOSTON Connection Shutdown  No protocol messages exchanged during setup/shutdown  Only payload carrying messages Server Process, Partition B Client Process disconnect()

NOKIA RESEARCH CENTER / BOSTON Connection Shutdown  No protocol messages exchanged during setup/shutdown  Only payload carrying messages Server Process, Partition B Client Process disconnect()

NOKIA RESEARCH CENTER / BOSTON Connection Setup/Shutdown  Well-known TCP-style connect/shutdown with exchange of SYN and FIN message exchange available as alternative Server Process, Partition B Client Process bind() listen() accept() connect(type=foo, instance=117) SYN (foo,117)

NOKIA RESEARCH CENTER / BOSTON Connection Abortion  Immediate “abortion” event in case of peer process crash Server Process, Partition B Client Process abort

NOKIA RESEARCH CENTER / BOSTON Connection Abortion  Immediate “abortion” event in case of peer node crash Server Process, Partition B Client Process abort Node

NOKIA RESEARCH CENTER / BOSTON Connection Abortion  Immediate “abortion” event in case of communication failure Server Process, Partition B Client Process abort Node

NOKIA RESEARCH CENTER / BOSTON Connection Abortion  Immediate abortion in case of node overload Server Process, Partition B Client Process Node abort

NOKIA RESEARCH CENTER / BOSTON Connection Flow Control  End-to-end send window of N messages slows sender process in case of receiver process overload  Acknowledge sent from receiver each N/2 message  Sender socket keeps only a counter, not a retransmission buffer Server Process, Partition B Client Process Node Acknowledg e

NOKIA RESEARCH CENTER / BOSTON Signalling Links  Retransmission protocol and congestion control at signalling link level  Transmitted packets acknowledged/released by any packet from other node  Packet losses detected and retransmission performed earlier  Packets from different sources are bundled in same buffer in case of congestion  Packet flow more traffic driven, no need for timers per socket or message Server Process, Partition B Client Process Node Client Process Server Process, Partition B

NOKIA RESEARCH CENTER / BOSTON Network Load Sharing  One link per node pair and interface  Typically two links per node pair, for full load sharing and redundancy Server Process, Partition B Client Process Node Client Process Server Process, Partition B

NOKIA RESEARCH CENTER / BOSTON Network Redundancy  Smooth failover in case of single link failure, with no consequences for user level connections  Each link supervised by conditional heartbeats, i.e. when no other traffic Server Process, Partition B Client Process Node Client Process Server Process, Partition B

NOKIA RESEARCH CENTER / BOSTON Code Status  Initial Release for Linux  Feedback (S. Hemminger, Jamal) was that we have to do some re-work Memory handling, buffer handling, locking policy, socket interface, management protocol/interface…  All issues addressed, but not all checked in at SF yet  New, fully POSIX compliant socket interface/implementation  More conventional use of buffers (performance…)  Reliable multicast needs more testing  Still not fully ready for inclusion in kernel, but we are close…

NOKIA RESEARCH CENTER / BOSTON Short Term Goals  End of August: Kernel Ready  Reliable multicast fully tested  New socket implementation finished and tested  Netlink based management/configuration protocol finished and tested Replaced all ioctls().

NOKIA RESEARCH CENTER / BOSTON Long Term Goals  Multi-cluster Functionality  Mostly user space  Automatic inter-cluster neighbour discovery and link setup  Fully manual inter cluster link setup  Guaranteeing name table consistency between clusters  Slave node name table reduction  Additional Bearers  Dynamic registration of “bearers” from user space (e.g. TCP, DCCP)  Distributed netlink ??

NOKIA RESEARCH CENTER / BOSTON

QUESTIONS ??