Presentation is loading. Please wait.

Presentation is loading. Please wait.

NORDUnet Nordic infrastructure for Research & Education NSI V3 Some Burning Topics for Discussion and Features for Consideration The musings of Jerry Sobieski.

Similar presentations


Presentation on theme: "NORDUnet Nordic infrastructure for Research & Education NSI V3 Some Burning Topics for Discussion and Features for Consideration The musings of Jerry Sobieski."— Presentation transcript:

1 NORDUnet Nordic infrastructure for Research & Education NSI V3 Some Burning Topics for Discussion and Features for Consideration The musings of Jerry Sobieski Sep 29, 2013 Singapore

2 NORDUnet Nordic infrastructure for Research & Education Topics High Availability/reliability vs “Protection” Enhanced STP specification and reference Enhanced Path Constraints P2MP and MP2MP Enhanced transport independence (MTL) Performance Verification Security Other misc....

3 NORDUnet Nordic infrastructure for Research & Education “Protection” vs Reliability “Protection” is intended to increase continuity of service... This is very poorly defined. – Does “protection” provide High Reliability? How do you know? How do you measure “reliability” – How “reliable” is sufficient? Or how reliabile is “high reliability”? Why do/would we assume a protection path is the magic bullet that will increase overall end to end reliability, or do so sufficiently? How do we quantify NSI Connection “reliability” or “availability” – We need to define terms such as” Fail Conditions, MTT Failure, MTT Recovery, MTT Repair, Service available (user) Working path Protect path ready fixed ready active restored fail

4 NORDUnet Nordic infrastructure for Research & Education “Protection” Traditional “Protection” switching does not affect MTBF, it is designed to minimize MTTR. – Low MTTR requires: alternate paths beyond the failure event radius, and high speed failure notification propagation – (neither of which does NSI currently provide) – N+1 protection schemes make many assumptions on failure modes – that undermine deterministic behaviour – Multi-Layer networks with protection can create protection storms – that can exacerbate failure modes.

5 NORDUnet Nordic infrastructure for Research & Education Protection, Reliability, Redundancy, Diversity Why not allow Users (i.e. the RA) to decide – If an alternate path is necessary (provide service reliability guarantees via the SD) – and when to switch to a protected path (end-to-end performance verification)? Empowers the RA to decide if/when to use an alternate path for any reason... – Means providers have no multi-domain protection switching restoral issues... – The uRA is really the only agent with end-to-end performance visibility anyway, so it (or a delegate) can monitor fail conditions E2E. To provide user-based alternate pathing, we only need to provide the RA with an ability to specify “path diversity” – of a specified radius. – We currently provide an ERO that the path must take, why not simply provide an ERO that the path must *NOT* take??? – We need a such path semantics to meet other path requirements, e.g. “avoid US transit domains”, or “transit any STP in network X”

6 NORDUnet Nordic infrastructure for Research & Education Enhanced STP Reference Syntax The purely symbolic STP LocalID is functional – But specifying large or varied sets of STPs is syntactically impractical NSI needs a specification syntax that allows more control over STP enumeration: – In the Reserve() request, we need to be able to specify sets of STPs – not just one. – In Topology descriptions a NSI domain should be able to express concisely the availability of thousands of STPs along its edge or that are paired with other adjacent networks. – NSI domains *must* retain the ability to announce STPs without announcing the hardware to which those STPs are mapped. – NSI cannot assume connections will be Ethernet. Thus any topological or semantic STP decriptions that expect or require Ethernet violate NSI technology agnostics. – We need a generalized means of announcing very large sets of STPs

7 NORDUnet Nordic infrastructure for Research & Education Enhanced STP Reference Syntax Proposal: Introduce subscripted STPs, and reference syntax for STPs. – Subscripted: Alpha:A[0..100] – Comma separated Lists of STPs: Alpha:A1,Alpha:A2,Beta:b[0..50,100..150], EndSystemZ:eth0[24,27,30] – Symbolic substitution: Alpha:A{1,2,3} is equivalent to Alpha:A1, Alpha:A2, Alpha:A3 (not to be confused with subscripted STPs) – Sets of STPs: Alpha:* (any STP in Alpha) or Alpha:A[*] Since NSI is *service* oriented, the mapping of the terminal to specific hardware is only relevant to the systems at the terminals (the NSA(s) ) Subscripted STPs allow for concise enumeration without linking the STP directly to particular hardware – Externally, large sets of STPs can be referenced as unordered sets or lists rather than individually. – Internally, the NRM will map a subscripted reference to a particular switch/blade/port/virtual label/vlabel2/... The internal mapping table generation is an implementation issue and not of NSI concern (that said – it can be easily generated by scripts in whatever form the NRM requires.) – A NSI domain is never required to expose internal topology. If they wish, they can do so. NSI Service domains need only announce SDP adjacencies that they wish to be visible in the global topology. – In relations (e.g. SDPs) the STP references must be ordered to provide the deterministic 1-to-1 pairings.

8 NORDUnet Nordic infrastructure for Research & Education Enhanced Path Constraints Currently, a NSI Connection has a single list of loose hop transit STPs: – SrcSTP, eroSTP1, eroSTP2,...eroSTPn, destSTP – These specify transit STPs, but do not provide path constraints between these STPs, or relative to other topological features or other service instances. We need enhanced path specification semantics in the Reserve() request: – We need to specify a set of transit points that are acceptable (e.g. “route through any available SDP between net X and net Y”, or ) – We need a means of stating avoidance criteria (e.g. “do not route through nets X, Y, Z”; or “do not route within R radius of Connection Q” ; – We have no way of stating required path relations such as “Route connection X within R Radius of Connection Y” or “route Connection X such that Latency(X)=Latency(Y)+/-5.0%” EPCs allow the RA to “say what they want” rather than needing to engineer the low level link themselves though remote (and often opaque) network infrastructure topology using just GMPLS style EROs. EPCs can prevent the exhaustive search problem... – E.g. SrcSTP = ESZ:A[*]; ERO:={ STP=B:*, STP=C:* }, DestSTP=ESZ:if1[*]

9 NORDUnet Nordic infrastructure for Research & Education MultiPoint Connections Unidirectional Connections – P2MP - Single Source Multicast trees Performance can guaranteed for Uni-Directional P2MP trees – MP2P trees can be useful for listening to many senders (monitoring?) – MP2MP Disjoint set of Senders and Receivers. (MP2P with multiple Listeners) Bi-directional Connections – P2MP == MP2P == MP2MP == Broadcast domains (!) Very useful!

10 NORDUnet Nordic infrastructure for Research & Education MultiPoint Connection Service Can we simplify NSI-CS by making all “Connections” unidirectional? (yes) – We compose Bi-Directional connections by path diversity constraints (there is no *real* reason to maintain th façade that bidirectional connections are an atomic construct – thus they should be a service construct.) – Path diversity constraints can provide much more powerful semantics for the Connection Service (see path diversity constraints) P2P, P2MP, MP2P, and MP2MP can all be encoded in the ReserveRequest() primitive – Allow multiple SrcSTPs and DestSTPs. – Turn the ERO List into an ERO tree – Add/remove end points is a ReserveModify() Thus a whole new SP service is not necessary – (Unless there are truly special features of an SP...) – E.g. Perhaps the “Switching Point” is a topological element, and not a Service, if so...this is an internal NRM issue rather than a inter-domain NSI issue...

11 NORDUnet Nordic infrastructure for Research & Education Formalized MTL NSI should formally define the interface between the NSI protocol layer and the underlying Message Transport Layer – This allows the high level NSI protocol layer to bound the set of actions it requires and expects of the lower layer MTL - only these actions will then be referenced in the NSI layer specification. – The MTL Interface should be very basic/atomic, in order to allow the protocol layer to control those aspects it deems important (like retries, or timeout values) Example: – MsgId = Qsend( destNSA, *msg, *GoodSendCallBack(msgid), Timeout, *TimeoutCallBack(msgid)) This internal function call places a NSI msg on the MTL transmit queue and returns immediately. A Queue exists for each destination NSA. If the message is sent and acknowledged, then the GoodSend call back is invoked. If the message is not successfully sent, the TimeOut will expire and the TimeoutCallBack is invoked. It is up to the call backs to decide how to treat the queued message – retry, remove, etc. – MsgDequeue( msgID ) This function removes a message from the send queue and deletes any timer(s) associated with it. – Q-Process(NSIreceiverCallBack(*msg)) - This function call registers a routine for processing incoming messages. All Inbound messages are placed on an input queue. When the queue is non-empty, if a call back is registered, the call back is invoked with the message at the top of the queue.

12 NORDUnet Nordic infrastructure for Research & Education NSI session mode Given a well defined MTL interface between the NSI protocol and the MTL functions... – This would allow an NSA to rely on the MTL to deliver messages any way the MTL can accomplish this. We propose an MTL Session Mode: The MTL of an NSA can establish a long lived session to another NSA over which all NSI protocol messages can be sent. – The session can be established from either side – Messages can flow either/both ways oncethe session is established. – If the session goes down, NSI messages are queued and wait for session recovery. Messages to other NSAs continue normally. – Queued messages are subject to send timeouts (thus long session outages will be handled by upper layer protocol appropriately.) – Messages can be retried “transparently” within the MTL layer before declaring TO, or could be retried explicitly from the NSI CallBack() if a TO occurs. – A GoodSendCB() routine is invoked when a message is sent successfully Session mode can also address NAT/FW issues. – The session can be established by either NSA. Once established, messages flow freely Since Session mode and MTL are oblivious to message content, they can send messages to *any* NSI. – An MTL simply needs to know the mapping of an NSAID to at least one network address (URL) to establish a session.

13 NORDUnet Nordic infrastructure for Research & Education NSI session mode NSI Session Mode can also address NAT/FW issues. – The session can be established by either NSA. Once established, messages flow freely – Only one session is required between NSAs, and this could be tied to explicit TCP port(s). – Since the NSI messaging is not tied to a specific session, multiple session establishments is not required and messages may be sent and received asynchronously (i.e. sequential primitive processing is not implied.) Session mode and MTL are oblivious to message content, they can send messages to *any* NSA. – Session Mode could be used for indirect forwarding of NSI messages... If a msg is received that is not destined to the local NSAID, the local NSA MTL can try to forward it...addrsses the issue where two NSAs are behind FWs or NATs...an external NSA can act as forwarding agent. – Note: Forwarding is cool, but it requires forarding tables and discovery – a lot of work recreating what IP routing already does. We should utilize natural IP forwarding whenever possible – thus avoiding NATs and FWs.

14 NORDUnet Nordic infrastructure for Research & Education Security We need a formal security profile in v3 – this is critical(!) Each primitive is authorized, but currently that authorization is trivial and/or inconsistent. – And in some cases has been ignored or incorrectly implemented for expediency. What are the attributes upon which industrial strength security policies can be constructed? – Attributes of the Actor: User, Project, Role, etc – Action being requested: Primitive, read? write? – Target of the Action: ConnectionID, segments, attributes of the target (such as who’s circuit is this? Should this user be allowed to cancel that circuit?) These attributes must be mapped to all NSI and related primitives/services – NSI should not make the policy, but NSI should define the attributes and insure the policy is enforced wherever appropriate in the protocol

15 NORDUnet Nordic infrastructure for Research & Education NSI Performance Verification NSI currently has no formally defined process for Performance Verification – The NSI Architeture, by merit of Service Descriptors and segmentation decomposition, lends itself to automated verification and fault localization processes. A Performance Verification “service” (??) could be requested to verify a Connection meets the confirmed As-Built details of the Connection instance – Before placing in service – Or if suspect The PV service would – under authorization – walk the service tree to verify each Connection segment. Passive “Black Box Recording” could be attached to suspect circuits to provide forensic data for debugging...using *real* data flows Active verification can be performed upon user request. Such a NSI based PV Service will need to define the architecture for PV testing... – And specify which aspects of the PV Architecture are add-ons to the NSI architecture. NSI needs a PV Architecture and Service !

16 NORDUnet Nordic infrastructure for Research & Education User Topology Processing User based topology processing scares users –. LHCONE meeting CERN Spring 2013 – Their scheduling issues are already complex, and now we tell them they are responsible for their network layout... We (Service Providers) need to provide a service these folks can use. – We should be doing the path selection based upon user specified requirements (not expecting users to feed us EROs) – We should be providing reliability factors they can take to the bank... Not requiring that they construct their own protection... – Thus we need to quit being engineers in NSI and start being service providers. We need to gracefully make it clear that: – Understanding the network topology *IS* understanding what is possible for your data transport requirements. These are the same. – NSI allows you to ask for what you want – it does not tell you what to ask for, nor can it devine what you need. – If you want to design or optimize your application network to take advantage of all available global network resources, you need to know the what global network resources are available – this is topology processing The alternative is ask the service providers for what you *need* - not assume you (as a user) can or should tell them how to build a global connection – And from NSI, we should not assume users all want to deal with the network details we find interesting.

17 NORDUnet Nordic infrastructure for Research & Education Peer-to-Peer MTL Current web services assume a client server model rather than a peer to peer model – NSI is a peer to peer service architecture. – Existing model presumes one RA at top of tree... "Total Propagation" is the ability to send messages freely between all agents in the service tree, both down and up the service tree. – This allows messages to traverse all NSAs that are part of a tree – including up and over. – Allows all nodes in a service tree to decide which – Allows leaf nodes to propagate unsolicited events (e.g. data plane errors) up the tree towards the RA – There are conditions that should be propagated end to end to the leaf uRAs. Traversing the tree from a leaf or intermediate node up and over and down to other leafs is necessary. – There is a need for third party agents besides the uRA to discover ("be aware" of) the entire service tree. A RA issuing a command to a PA may not be the originating uRA at the "top" of the tree - and only the PA would know this. In order to send notifications, queries, or other commands to all the nodes in the tree (not just subnodes), we need to be able to send "up and over" to the other side of the tree.

18 NORDUnet Nordic infrastructure for Research & Education NSI Topology Requirements How does an NSI agent (NSA?) compile a global view? We assume an NSI entity announces or advertises some topology... – What is the process by which that topology is exchanged? – How do we determine the authenticity of the topology? (fake topology can be a security threat) – Coherency – how do we insure updates are distributed in a timely fashion? What constitutes a “topology description” – is it any set of topology information? Or should it be information pertaining to a specific network portion (a service domain, some network infrastructure, an opaque object, an area of authority?)

19 NORDUnet Nordic infrastructure for Research & Education NSI Testing --- verify that a recursive query is not exposing data improperly due to skipping authorization checks. IE all nodes in the service tree should be consulted in- order and each node should perform a AAI check. --- test vlan topology as alternative to label construction. --- request a reservation end point that has not been advertised. This insures that two tuple stp identifiers are used. define a service domain that does not use vlans at all - just ethernet ports and that has a thousand plus ports on different switches peering with a single other network. define a service domain that uses QinQ so tht there are hundreds of sdps between same two networks each with separate outter vlan. define a service that only advertises 2-tuples...is the NSA aggregator able to incorporate this transit domain into its pf? --- check that the nsi v2 spec does not *require* a triple with label to function... --- in general, try to use the detox and.or nml topology in ways that the wg dislike, yet are left to discretion of the topology provider. --- offer different topology views to different clients...or multiple topology descriptions for the same network

20 NORDUnet Nordic infrastructure for Research & Education NSI Testing --- Modify timeouts.... If a child modification does not resolve within the timeout values, does the tree unwind properly? --- can a uRA hang or crash a NSA simply by forcing a modification to fail putting the connection into an unkonw state? A modify to reduce capacity, start time, etc...?

21 NORDUnet Nordic infrastructure for Research & Education Topo requirements Geolocation attribute that describes a resource. – Spot location, – along a length of a resource, 2 or 3 dimensional resources. At least 2 tuple for topo objects. – Preferably a hierarchical naming.

22 NORDUnet Nordic infrastructure for Research & Education NSI Conformance Test Suite – We need a formalized set of tests that define correct performance of NSA. Or identify optional performance criteria. These are tests to insure the framework services are functioning according to standard. – The Conformance Suite is not software – it specifies a set of tests and their expected results

23 NORDUnet Nordic infrastructure for Research & Education Can we expand the notion of the STP-internal mapping to allow a broader set of terminal tuples? For instance, an STP could map to a flow space, or it could map to higher layer constructs such as TCP ports. Need to re-visit the notion of ethernet labels. These are technology specific and should not be part of the NSI protocol primitives. We should stick to STPs as simply tags or names for locally specific - and private - specifications for the end point.

24 NORDUnet Nordic infrastructure for Research & Education Virtualization. NSI must be able to function within a multi-layer virtualized environment. – Ex: LHCONE sets up a virtual network of high capacity links for the LHC community. This "baseline" LHCONE capacity is established via NSI services globally. CMS and ATLAS jobs would like to dynamically allocate that capacity to specific workflows. LHCONE should be able to provide NSI services to LHCONE clients usingthe baseline topology that LHCONE provisioned from other NSI services. – NSI architecture and protocols should allow this. And guidance should be given as to how NSI services can/should be engineered to accomplish this. This example can have other use cases as well - say GEANT wants to create a direct BoD peering with RNP. Assuming the infrastructure providers (say, Interent2 and/or AMPATH) are NSI compliant, such a link can be easily and quickly established. (Note - there may be policy/best- practice issues that govern the creation of circuits for such use, but this is not an NSI architecture, protocol, or provisioning issue.) The GEANT BoD service (NSI compliant) should be able to incorporate this new [virtual] circuit as part of the topology it provisions for its NSI clients. Another use case might be for GENI and or SA2 style services that wish to interoperate via well defined but highly variable topology. These projects should be able to construct their desired topology and then treat that [virtual] topology as their own allocatable resource - which could harness the power of NSI to build virtual SDN slices globally. We should define a Performance Verification architecture as part of NSI. Mostly the architecture is well suited for it, but we need a set of primitves that can request verification tests be performed: – active/passive testing, – internal (to a service domains) or external (across) a service domain(s)...


Download ppt "NORDUnet Nordic infrastructure for Research & Education NSI V3 Some Burning Topics for Discussion and Features for Consideration The musings of Jerry Sobieski."

Similar presentations


Ads by Google