Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSci8211: SDN Controller Design: More on ONOS

Similar presentations


Presentation on theme: "CSci8211: SDN Controller Design: More on ONOS"— Presentation transcript:

1 CSci8211: SDN Controller Design: More on ONOS
Open Network OS by ON.LAB Current ONOS Architecture Tiers subsystem components Consistency Issues in ONOS Switch mastership Network graph CSci8211: SDN Controller Design: More on ONOS

2 Key Performance Requirements
High Throughput: ~500K-1M paths setups / second ~1-2M network state ops / second High Volume: ~10GB of network state data Difficult challenge! Apps Apps ONOS Global Network View / State Global Network View / State high throughput | low latency | consistency | high availability

3 CSci8211: SDN Controller Design: ONOS
ONOS Prototype II Prototype 2: focus on improving performance, esp. event latency RAMcloud for data store: Blueprints APIs directly on top RAMcloud: low-latency, dist. key-value store w/ us read/write Optimized data model table for each type of net. objects (switches, links, flows, …) minimize # of references bw elements: most updates require 1 r/w (In-memory) topology cache at each ONOS instance eventual consistent: updates may receive at diff. times & orders atomicity under schema integrity: no r/w by apps midst an update Event notifications using Hazelcast, a pub-sub system CSci8211: SDN Controller Design: ONOS

4 Current ONOS Architecture Tiers
Northbound - Application Intent Framework (policy enforcement, conflict resolution) OpenFlow NetConf . . . Apps Distributed Core (scalability, availability, performance, persistence) Southbound (discover, observe, program, configure) Northbound Abstraction: - network graph - application intents Core: - distributed - protocol independent Southbound Abstraction: - generalized OpenFlow - pluggable & extensible

5 Distributed Architecture for Fault Tolerance
NB Core API Distributed Core (state management, notifications, high-availability & scale-out) SB Core API Protocols Adapters Apps

6 Modularity Objectives
Increase architectural coherence, testability and maintainability establish tiers with crisply defined boundaries and responsibilities setup code-base to follow and enforce the tiers Facilitate extensibility and customization by partners and users unit of replacement is a module Avoid speciation of the ONOS code-base APIs setup to encourage extensibility and community code contributions Preempt code entanglement, i.e. cyclic dependencies reasonably small modules serve as firewalls against cycles Facilitate pluggable southbound

7 Performance Objectives
Throughput of proactive provisioning actions path flow provisioning global optimization or rebalancing of existing path flows Latency of responses to topology changes path repair in wake of link or device failures Throughput of distributing and aggregating state batching, caching, parallelism dependency reduction Controller vs. Device Responsibilities defer to devices to do what they do best, e.g low-latency reactivity, backup paths

8 ONOS Services & Subsystems
Device Subsystem - Manages the inventory of infrastructure devices. Link Subsystem - Manages the inventory of infrastructure links. Host Subsystem - Manages the inventory of end-station hosts and their locations on the network. Topology Subsystem - Manages time-ordered snapshots of network graph views. PathService - Computes/finds paths between infrastructure devices or between end-station hosts using the most recent topology graph snapshot. FlowRule Subsystem - Manages inventory of the match/action flow rules installed on infrastructure devices and provides flow metrics. Packet Subsystem - Allows applications to listen for data packets received from network devices and to emit data packets out onto the network via one or more network devices. 

9 ONOS Services & Subsystems

10 ONOS Core Subsystem Structure
Manager Component Adapter App Service AdminService Listener notify command sync & persist add & remove query & AdapterRegistry AdapterService register & unregister sensing Store Store Protocols

11 ONOS Modules Well-defined relationships Basis for customization
onos-api onlab-util-misc onlab-util-rest onlab-util-osgi onos-of-api onos-of-ctl . . . onos-core-net onos-of-adapter-* onos-core-store Well-defined relationships Basis for customization Avoid cyclic dependencies

12 ONOS Southbound Attempt to be as generic as possible
Enable partners/contributors to submit their own device/protocol specific providers Providers should be stateless; state may be maintained for optimization but should not be relied upon

13 ONOS Southbound ONOS supports multiple southbound protocols, enabling a transition to true SDN. Adapters provide descriptions of dataplane elements to the core - core utilizes this information. Adapters hide protocol complexity from ONOS.

14 Descriptions Serve as scratch pads to pass information to core
Descriptions are immutable and extremely short lived Descriptions contains URI for the object they are describing URI also encode the Adapter the device is linked to

15 Adapter Patterns 1 3 2 4 Adapter registers with core
Core returns a AdapterService bound to the Adapter Adapter uses AdapterService to notify core of new events (device connected, pktin) via Descriptions 1 2 3 4 Adapter Component Manager AdapterRegistry AdapterService register & unregister sensing Protocols This is where the magic happens 3. Core can use Adapter to issue commands to elements under Adapter control 4. Eventually, the adapter unregisters itself; core will invalidate the AdapterService

16 Distributed Core Distributed state management framework
built for high-availability and scale-out Different types of state require different types of synchronization fully replicated master / slave replicated partitioned / distributed Novel topology replication technique logical clock in each instance timestamps events observed in underlying network logical timestamps ensure state evolves in consistent and ordered fashion allows rapid convergence without complex coordination applications receive notifications about topology changes

17 Consistency Issues in ONOS
Recall Definitions of Consistency Models Strong Consistency Upon an update to the network state by an instance, all subsequent reads by any instance returns last updated value. Strong consistency adds complexity and latency to distributed data management Eventual consistency is a relaxed model allowing readers to be behind for a short period of time a liveness property There are other models: e.g., serial consistency, causal consistency Q: what consistency model or models should ONOS adopt? Consistency is not a binary value but a fraction between 0 and 1 excluding both extremes. So lets see what is our consistency definition. Strong consistency: Every update is instantly available on other instances of ONOS. Next bullet: It cannot be instant in reality, there is always some cost/delay involved. Our eventual consistency is much closer to 1 than many other eventual consistency definitions

18 ONOS Distributed Core Responsible for all state management concerns
Organized as a collection of “stores” Ex: Topology, Link Resources, , Intents, etc. Properties of state guide ACID vs BASE choice Q: What are possible categories of various states in a (logically centralized) SDN control plane, e.g., ONOS? Hi, This is Madan Jampani. I’ll briefly talk about the role of the distributed core subsystem in ONOS and how it is architected. As Thomas alluded to in his presentation, the core subsystem’s primary responsibility is to manage state and do so in a highly available and performant manner. To that end and to better align with our modularity objectives, the core is organized as a collection of stores with each store serving the persistence needs of a specific service. Topology store, Store for intents are couple of examples. The idea here is for the specific store to abstract away all the gory details of replication, fault-tolerance, etc. Now, as Thomas mentioned, we assume there is no such thing as a one-stop-solution when it comes to state management and therefore the approach we take and choices we make for each store is dictated by the properties of the associated state. For example some state mandates ACID semantics, some state is eventually consistent and so on.

19 ONOS Global Network View
Applications observe Applications Applications Applications program Network State •  Topology (Switch, Port, Link, …) •  Network Events (Link down, Packet In, …) •  Flow state (Flow table, connectivity paths, ...) Switch Port Link Host Intent FlowPath FlowEntry CSci8211: SDN Controller Design: ONOS

20 ONOS State and Properties
Network Topology Eventually consistent, low latency access Flow Rules, Flow Stats Eventually consistent, shardable, soft state Switch - Controller mapping Distributed Locks Strongly consistent, slow changing Application Intents Resource Allocations Strongly consistent, durable, Immutable In this slide is a partial listing of some of the state that is tracked inside ONOS and the associated properties. As you can see, there are some similarities as well as some differences. Some state is eventually consistent whereas some state needs stronger guarantees. Even for stuff at the same consistency level, the number of replicas could be dictated by the nature of read/write access and desired level of durability and performance.

21 Switch Mastership: Strong Consistency
Network Graph Distributed Network OS Instance 2 Instance 3 Instance 1 A = Switch A Master = NONE A = ONOS 1 Registry Master = ONOS 1 Timeline All instances Switch A Master = NONE Instance 1 Switch A Master = ONOS 1 Instance 2 Switch A Master = ONOS 1 Instance 3 Switch A Master = ONOS 1 Master elected for switch A Delay of Locking & Consensus We use Transactional Consistency for Master election. Lets look at Registry: We see Switch has NO Master when we start. Lets see the point in time where Master is elected and ONOS 1 is marked as master. After the Master is elected and mastership is consistently propagated on all instances. All subsequent reads for master on any instance shows Master as ONOS 1 for Switch A. This guarantees that we have control isolation.

22 “Tell me about your slice?”
ONOS Switch Mastership & Topology Cache “Tell me about your slice?” Cache

23 “Tell me about your slice?”
ONOS Switch Mastership: ONOS Instance Crash “Tell me about your slice?”

24 ONOS Switch Mastership:
ONOS Instance Crash ONOS uses Raft to achieve consensus and strong consistency

25 Switch Mastership Terms
Switch mastership term is incremented every time mastership changes Switch mastership term is tracked in a strongly consistent store Locking and latches are distributed primitives which need consistency.

26 Why Strong Consistency for Master Election
Weaker consistency might mean Master election on instance 1 will not be available on other instances. That can lead to having multiple masters for a switch. Multiple Masters will break our semantic of control isolation. Strong locking semantic is needed for Master Election Locking and latches are distributed primitives which need consistency.

27 Eventual Consistency in Network Graph
Distributed Network OS Instance 2 Instance 3 Instance 1 SWITCH A STATE= INACTIVE Switch A State = INACTIVE STATE = INACTIVE DHT State = ACTIVE STATE = ACTIVE All instances Switch A STATE = ACTIVE Instance 1 Switch A = ACTIVE Instance 2 Switch A = INACTIVE Instance 3 Switch A = INACTIVE Switch Connected to ONOS Timeline Switch A STATE = INACTIVE Delay of Eventual Consensus Lets take a concrete example. Before switch is connected: it is marked INACTIVE on our Network Graph. As soon as the switch connects to ONOS 1 it is marked ACTIVE immediately on INSTANCE 1. For a fraction of second, instance 2 and instance 3 might still show the switch as INACTIVE till they are eventually consistent (white line) From that point the switch state is consistent on All instances. What about link failure? Both ends may detect it!

28 Topology Events and Eventual Consistency
Cache Cache Cache detection of topology event  update distributed topology store link failure? link failure?

29 Topology Events and Eventual Consistency
Cache Cache Cache link failure link failure

30 Topology Events and Eventual Consistency
Instance 1 crashed? Cache Instance 1 crashed? Cache Cache link failure link failure

31 Switch Event Numbers Topology as state machine Events: switch/port/link up or down Each event has a unique logical timestamp: [switch id, term number, event number] Events are timestamped on arrival and broadcast Partial ordering of topology event: “stale” events are dropped on receipt Anti-entropy : peer-to-peer, lightweight, gossip protocol: quickly bootstraps newly joined ONOS instance Key challenge: view should be consistent with the network, not (simply) other views! Locking and latches are distributed primitives which need consistency.

32 Cost of Eventual Consistency
Short delay will mean the switch A state is not ACTIVE on some ONOS instances in previous example (!?) Applications on one instance will compute flow through the switch A, while other instances will not use the switch A for path computation (!?) Eventual consistency becomes more visible during control plane network congestion Q: What are possible consequences of eventual consistency of network graph? Eventual consistency of network graph  eventual consistency of flow entries?  re-do path computation/flow setup? What does it mean? Switch is not available instantly for path computation on some instances. Some flows can be computed without using switch A. Similarly when flow entries are written to Network graph they become eventually available on all instances and each instance picks its flow entries and pushes onto switches.

33 Why is Eventual Consistency Good enough for Network State?
Physical network state changes asynchronously Strong consistency across data and control plane is too hard Control apps know how to deal with eventual consistency In distributed control plane, each router makes its own decision based on old info from other parts of the network: it works fine But in the current distributed control plane, destination-based, shortest path routing is used; this guarantees eventual consistency of routing tables computed by each individual router! Strong Consistency is more likely to lead to inaccuracy of network state as network congestions are real How to reconcile and address this challenge? Asynchronously changing state is not transactional and does not need strong consistency. Control apps know that they are always working on a stale state and know to deal with it. Today’s routers work in the same fashion. Network state is most often used by current gen control applications with assumed eventual consistency. Time check: 13 minutes

34 Consistency: Lessons One Consistency does not fit all Consequences of delays need to be well understood More research needs to be done on various states using different consistency models Network OS (logically centralized control plane): distributed stores and consistent network state (network graph, flow entries & various network states) also unique challenges!

35 ONOS Service APIs

36 Application Intent Framework
Application specifies high-level intents; not low-level rules focus on what should be done, rather than how it should be done Intents are compiled into actionable objectives which are installed into the environment e.g. HostToHostIntent compiles into two PathIntents Resources required by objectives are then monitored e.g. link vanishes, capacity or lambda becomes available Intent subsystem reacts by recompiling intent and re-installing revised objectives Will come back and discuss this later in the semester if we have time!

37 Reflections/Lessons Learned: Things We Got Right
Control isolation (sharding) Divide network into parts and control them exclusively Load balancing -> we can do more Distributed data store That scales with controller nodes with HA -> though we need low latency distributed data store Dynamic controller assignment to parts of network Dynamically assign which part of network is controlled by which controller instance -> we can do better with sophisticated algorithms Graph abstraction of network state Easy to visualize and correlate with topology Enables several standard graph algorithms ----- Meeting Notes (11/20/13 23:20) ----- We got few things right. Partitioning the network into parts to be controlled exclusively helps in basic load balancing. And ofcourse we could do better. Scalability and HA was weill handled using distributed data stores. Dynamic fail-over and assignment of part of network to controller helps very well in HA. We could have done better using sophisticated algorithms. Network Graph as northbound abstraction is appealing to many and we can do better by formalizing a graph model for ONOS.

38 What’s Available in ONOS Today?
ONOS with all its key features high-availability, scalability*, performance* northbound abstractions (application intents) southbound abstractions (OpenFlow adapters) modular code-base GUI Open source ONOS code-base on GitHub documentation & infrastructure processes to engage the community Use-case demonstrations SDN-IP, Packet-Optical Sample applications reactive forwarding, mobility, proxy arp Please do check out the current ONOS code distribution on ONOS project site


Download ppt "CSci8211: SDN Controller Design: More on ONOS"

Similar presentations


Ads by Google