Jon Maloy, Ericsson Steven Blake, Modularnet Maarten Koning, WindRiver Jamal Hadi Salim,Znyx Hormuzd Khosravi,Intel draft-maloy-tipc-01.txt TIPC as TML IETF-61, Washington DC, Nov 2004
NOKIA RESEARCH CENTER / BOSTON TIPC A transport protocol for cluster environments Connectionless and Connection Oriented; Reliable or Unreliable. Reliable or Unreliable Multicast Usage not limited to ForCES context A framework for detecting, supervising and maintaining cluster topology Available as portable open source code package under BSD licence lines of C code, 112 kbyte Linux kernel module Runs on 4 OS:es so far, and more to come Proven concept, used and deployed in several Ericsson products
NOKIA RESEARCH CENTER / BOSTON ForCES Protocol Framework ForCES Protocol Messages CE TML CE PL (ForCES Protocol) Transport (IP,TCP,RapidIO,Ethernet…) FE TML FE PL (ForCES Protocol) Transport (IP,TCP,RapidIO,Ethernet…)
NOKIA RESEARCH CENTER / BOSTON TIPC as L2 TML ForCES Protocol Messages TIPC TML CE PL (ForCES Protocol) L2 Transport (RapidIO,Ethernet…) TIPC TML FE PL (ForCES Protocol) L2 Transport (RapidIO,Ethernet…)
NOKIA RESEARCH CENTER / BOSTON Interface Adaptation ForCES Protocol Messages TIPC TML CE PL (ForCES Protocol) L2 Transport (RapidIO,Ethernet…) TIPC TML FE PL (ForCES Protocol) L2 Transport (RapidIO,Ethernet…) Interface Adaptation
NOKIA RESEARCH CENTER / BOSTON Reliability Reliable transport in all modes Can be made unreliable per socket/direction Security Only secure within closed networks. No explicit authentication/encryption support yet, but planned Not IP-based, no router will forward TIPC messages!! Congestion Control At three levels: Connection/Transport, Signalling Link and Carrier level Will give feedback to PL layer if connection is broken or message rejected Multicast/Broadcast Supported Fulfilling Requirements(1)
NOKIA RESEARCH CENTER / BOSTON Timeliness Immediate delivery (No Nagle algorithm) Inter-node delivery time in the order of 100 microseconds HA Considerations L2 link failure detection and failover handled transparently for user Connection abortion with error code if no redundant carrier available Peer node failure detection after seconds Encapsulation 24 byte extra header 40 extra for connectionless Priorities Supports 4 message importance priorities, determining congestion levels and abort/rejection levels Is 8 levels really needed ? Fulfilling Requirements(2)
NOKIA RESEARCH CENTER / BOSTON Connection Directly on TIPC LFB 1LFB 2 FE Object FB XFB Y CE Object FE CE TIPC
NOKIA RESEARCH CENTER / BOSTON Connections via FE/CE Object FE Object CE Object FE CE TIPC LFB 1LFB 2 FB XFB Y
NOKIA RESEARCH CENTER / BOSTON Connection Usage FE Object CE Object FE CE LFB 1LFB 2 FB XFB Y Control Connection: High Priority Reliable in both directions Traffic Data Connection: Low Priority Reliable CE->FE Unreliable FE->CE TIPC
NOKIA RESEARCH CENTER / BOSTON Server Process, Partition B Server Process, Partition A Client Process bind(type = foo, lower=0, upper=99) sendto(type = foo, instance = 33) bind(type = foo, lower=100, upper=199) foo,33 Functional Addressing: Unicast Function Address Persistent, reusable 64 bit port identifier assigned by user Consists of type number and instance number Function Address Sequence Sequence of function addresses with same type
NOKIA RESEARCH CENTER / BOSTON Address Mapping -Unicast FE Object CE Object FE CE LFB 1 Meter 44 FB X RSVP 77 TIPC TIPC API TML API tml_bind(RSVP,77) bind(RSVP,77,77) TML API tml_bind(meter,44) bind(meter,44,44) TIPC API
NOKIA RESEARCH CENTER / BOSTON Connection Setup FE Object CE Object FE 17 CE 8 LFB 1 Meter 44 FB X RSVP 77 TIPC TIPC API TML API tml_bind(RSVP,77) bind(RSVP,77,77) tml_connect(RSVP,77, CEID=8) connect(RSVP,77,node=8) If instance numbers are coordinated over whole cluster there is no need for LFBs to know CEID
NOKIA RESEARCH CENTER / BOSTON Server Process, Partition B Server Process, Partition A Client Process bind(type = foo, lower=0, upper=99) sendto(type = foo, lower = 33, upper = 133) bind(type = foo, lower=100, upper=199) foo,33,133 Functional Addressing: Multicast Based on Function Address Sequences Any partition overlapping with the range used in the destination address will receive a copy of the message Client defines “multicast group” per call
NOKIA RESEARCH CENTER / BOSTON Address Mapping -Multicast FE Object CE Object FE CE Meter 13 Meter 44 FB X RSVP 77 TIPC tml_mcast(meter_mc, group=X) sendto(meter_mc,X,X) tml_join(meter_mc,X) bind(meter_mc,X,X) tml_join(meter_mc,X)
NOKIA RESEARCH CENTER / BOSTON Questions???
Congestion control at three levels Connection level, signalling link level and media level Based on 4 importance priorities Simple to configure Each node needs to know its own identity, that is all Automatic neighbour detection using multicast/broadcast Lightweigth, Reactive Connections Immediate connection abortion at node/process failure or overload Toplogy Subscription Service Functional and physical topology Why TIPC in ForCES ?
NOKIA RESEARCH CENTER / BOSTON Infiniband Mirrored Memory EthernetSCTPUDP Bearer Adapter API Sequence/Retransmission Control Packet Bundling Congestion Control Fragmentation/De-fragmentation Reliable Multicast Neighbour Detection Link Establish/Supervision/Failover Address Table Distribution Connection Supervision Route/Link Selection Address SubscriptionAddress Resolution User Adapter API Socket API AdapterPort API AdapterOther API Adapters Node Internal Functional View
NOKIA RESEARCH CENTER / BOSTON Zone Node Internet/ Intranet Slave Node Network Topology Cluster
NOKIA RESEARCH CENTER / BOSTON Server Process, Partition B Server Process, Partition A Client Process bind(type = foo, lower=0, upper=99) sendto(type = foo, instance = 33) bind(type = foo, lower=100, upper=199) foo,33 Functional Addressing: Unicast Function Address Persistent, reusable 64 bit port identifier assigned by user Consists of type number and instance number Function Address Sequence Sequence of function addresses with same type
NOKIA RESEARCH CENTER / BOSTON Server Process, Partition B Server Process, Partition A Client Process bind(type = foo, lower=0, upper=99) sendto(type = foo, lower = 33, upper = 133) bind(type = foo, lower=100, upper=199) foo,33,133 Functional Addressing: Multicast Based on Function Address Sequences Any partition overlapping with the range used in the destination address will receive a copy of the message Client defines “multicast group” per call
NOKIA RESEARCH CENTER / BOSTON Location of server not known by client Lookup of physical destination performed on-the-fly Efficient, no secondary messaging involved Client Process sendto(type = foo, lower = 33, upper = 133) Node Server Process, Partition B Server Process, Partition A bind(type = foo, lower=0, upper=99) bind(type = foo, lower=100, upper=199) foo,33,133 Location Transparency
NOKIA RESEARCH CENTER / BOSTON Location of server not known by client Lookup of physical destination performed on-the-fly Efficient, no secondary messaging involved Client Process sendto(type = foo, lower = 33, upper = 133) Node Server Process, Partition B Server Process, Partition A bind(type = foo, lower=0, upper=99) bind(type = foo, lower=100, upper=199) foo,33,133 Location Transparency Node
NOKIA RESEARCH CENTER / BOSTON Node bind(type = foo, lower=100, upper=199) Node Location of server not known by client Lookup of physical destination performed on-the-fly Efficient, no secondary messaging involved Client Process sendto(type = foo, lower = 33, upper = 133) Node Server Process, Partition B Server Process, Partition A bind(type = foo, lower=0, upper=99) foo,33,133 Location Transparency
NOKIA RESEARCH CENTER / BOSTON Many sockets may bind to same partition Closest-First or Round-Robin algorithm chosen by client bind(type = foo, lower=0, upper=99) Client Process sendto(type = foo, lower = 33, upper = 133) Server Process, Partition A’ Server Process, Partition A bind(type = foo, lower=0, upper=99) foo,33,133 Address Binding
NOKIA RESEARCH CENTER / BOSTON Many sockets may bind to same partition Closest-First or Round-Robin algorithm chosen by client Same socket may bind to many partitions bind(type = foo, lower=100, upper=199) Client Process sendto(type = foo, lower = 33, upper = 133) Server Process, Partition B Server Process, Partition A+B’ bind(type = foo, lower=0, upper=99) bind(type=foo, lower=100, upper=199) foo,33,133 Address Binding
NOKIA RESEARCH CENTER / BOSTON Many sockets may bind to same partition Closest-First or Round-Robin algorithm chosen by client Same socket may bind to many partitions Same socket may bind to different functions bind(type = foo, lower=100, upper=199) Client Process sendto(type = foo, lower = 33, upper = 133) Server Process, Partition B Server Process, Partition A bind(type = foo, lower=0, upper=99) bind(type=bar, lower=0, upper=999) foo,33,133 Address Binding
NOKIA RESEARCH CENTER / BOSTON Server Process, Partition B Server Process, Partition A Client Process bind(type = foo, lower=0, upper=99) subscribe(type = foo, lower = 0, upper = 500) bind(type = foo, lower=100, upper=199) foo,100,199 foo,0,99 Functional Topology Subscription Function Address/Address Partition bind/unbind events
NOKIA RESEARCH CENTER / BOSTON TIPC bind(type = node, lower=0x , upper=0x ) Node Client Process subscribe(type = node, lower = 0x , upper = 0x ) node,0x node,0x Node bind(type = node, lower=0x , upper=0x ) TIPC Network Topology Subscription Node/Cluster/Zone availability events Same mechanism as for function events
NOKIA RESEARCH CENTER / BOSTON ForCES Applied on TIPC Network Equipment Control Element Forwarding Element OSPF, RIP COPS, CLI, SNMP Other Applications ForCES Protocol/TIPC LFB
NOKIA RESEARCH CENTER / BOSTON Network Equipment Control Element ForCES applied on TIPC Control Element Forwarding Element OSPF, RIP COPS, CLI, SNMP Other Applications Internet ForCES Protocol/TIPC LFB
NOKIA RESEARCH CENTER / BOSTON CONNECTIONS Establishment based on functional addressing Selectable lookup algorithm, partitioning, redundancy etc No protocol messages exchanged during setup/shutdown Only payload carrying messages Traditional TCP-style connection setup/shutdown as alternative End-to-end flow control SOCK_SEQPACKET SOCK_STREAM SOCK_RDM for connectionless and multicast SOCK_DGRAM can easily be added if needed Same with “Unreliable SOCK_SEQPACKET”
NOKIA RESEARCH CENTER / BOSTON CONNECTIONS foo,117 Server Process, Partition B Client Process sendto(type = foo, instance = 117 ) No protocol messages exchanged during setup/shutdown Only payload carrying messages
NOKIA RESEARCH CENTER / BOSTON CONNECTIONS No protocol messages exchanged during setup/shutdown Only payload carrying messages Server Process, Partition B Client Process connect(client) send()
NOKIA RESEARCH CENTER / BOSTON CONNECTIONS No protocol messages exchanged during setup/shutdown Only payload carrying messages Server Process, Partition B Client Process connect(server)
NOKIA RESEARCH CENTER / BOSTON CONNECTIONS Immediate “abortion” event in case of peer process crash Server Process, Partition B Client Process abort
NOKIA RESEARCH CENTER / BOSTON CONNECTIONS Immediate “abortion” event in case of peer node crash Server Process, Partition B Client Process abort Node
NOKIA RESEARCH CENTER / BOSTON CONNECTIONS Immediate “abortion” event in case of communication failure Server Process, Partition B Client Process abort Node
NOKIA RESEARCH CENTER / BOSTON CONNECTIONS Immediate “abortion” event in case of node overload Server Process, Partition B Client Process Node abort
NOKIA RESEARCH CENTER / BOSTON Network Redundancy Retransmission protocol and congestion control at signalling link level Normally two links per node pair, for full load sharing and redundancy Server Process, Partition B Client Process Node
NOKIA RESEARCH CENTER / BOSTON Network Redundancy Retransmission protocol and congestion control at signalling link level Normally two links per node pair, for full load sharing and redundancy Smooth failover in case of single link failure, with no consequences for user level connections Server Process, Partition B Client Process Node
NOKIA RESEARCH CENTER / BOSTON Remaining Work Implementation Reliable Multicast not fully implemented yet (exp. end of Q1) Re-stabilization after most recent changes Re-implementation of multi-cluster neighbour detection and link setup Protocol Fully manual inter cluster link setup Guaranteeing Name Table consistency between clusters Slave node Name Table reduction ?????
NOKIA RESEARCH CENTER / BOSTON
QUESTIONS ??