Www.opendaylight.org 2-Node Clustering Active-Standby Deployment.

www.opendaylight.org 2-Node Clustering Active-Standby Deployment

www.opendaylight.org Requirements Configuration of Primary controller in cluster (Must) Primary Controller services the NorthBound IP address, a Secondary takes over NB IP upon failover (Must) Configuration of whether on failover & recovery, configured Primary controller reasserts leadership (Must) Configuration of merge strategy on failover & recovery (Want) Primary controller is master of all devices and is leader of all shards (Must) Single node operation allowed (access to datastore on non-quorum) (Want) 2-Node Deployment Topology 2 Active-Standby Requirements

www.opendaylight.org Failover Sequence 1. Secondary controller becomes master of all devices and leader of all shards Failure of Primary 3 Scenario 1: Master Stays Offline

www.opendaylight.org Recovery Sequence 1. Controller A comes back online and its data is replaced by all of Controller B’s data 2. For Re-assert leadership configuration: 1. (ON) Controller A becomes master of all devices and leader of all shards 2. (OFF) Controller B stays master of all devices and maintains leadership of all shards Failure of Primary 4 Scenario 2: Primary Comes Back Online

www.opendaylight.org Failover Sequence 1. Controller A becomes master of devices in its network segment and leader of all shards 2. Controller B becomes master of devices in its network segment and leader of all shards Network Partition 5 Scenario 1: During Network Partition

www.opendaylight.org Recovery Sequence 1. Merge data according to pluggable merge strategy (Default: Secondary’s data replaced with Primary’s data.) 2. For Re-assert leadership configuration: 1. (ON) Controller A becomes master of all devices and leader of all shards again. 2. (OFF) Controller B becomes master of all devices and leader of all shards again Network Partition 6 Scenario 2: Network Partition Recovers

www.opendaylight.org Scenarios 1. Secondary controller failure. 2. Any single link failure. 3. Secondary controller loses network connectivity (but device connections to Primary maintained) No-Op Failures 7 Failures That Do Not Result in Any Role Changes

www.opendaylight.org Global 1. Cluster Leader (aka “Primary”) 1. Allow this to be changed on live system, e.g. maintenance. 2. Assigned (2-Node Case), Elected (Larger Cluster Case) 2. Cluster Leader Northbound IP 3. Reassert Leadership on Failover and Recovery 4. Network Partition Detection Alg. (pluggable) 5. Global Overrides of Per Device/Group and Per Shard items (below) Per Device / Group 1. Master / Slave Per Shard 1. Shard Leader (Shard Placement Strategy – pluggable) 2. Shard Data Merge (Shard Merge Strategy – pluggable) Cluster Configuration Options 8 Global & Granular Configuration

www.opendaylight.org Can we Abstract Configurations to Admin-Defined Deployment Scenarios?  e.g. Admin Configures 2-Node (Active-Standby):  This means Primary controller is master of all devices and leader of all shards.  Conflicting configurations are overridden by deployment scenario. HA Deployment Scenarios 9 Simplified Global HA Settings

www.opendaylight.org Clustering: 1. Refactoring of Raft Actor vs. 2-Node Raft Actor code. 2. Define Cluster Leader 3. Define Northbound Cluster Leader IP Alias OpenFlow Plugin: 1. OpenFlow Master/Slave Roles 2. Grouping of Master/Slave Roles (aka “Regions”) System: 1. Be Able to SUSPEND the Secondary controller to support Standby mode. Implementation Dependencies 10 Potential Changes to Other ODL Projects

www.opendaylight.org TBD: 1. Is Master/Slave definition too tied to OpenFlow? (Generalize?) 1. Should device ownership/mastership be implemented by OF Plugin? 2. How to define Northbound Cluster Leader IP in a platform independent way? (Linux/Mac OSx: IP Alias, Windows: Possible) 1. Gratuitous ARP on Leader Change. 3. When both Controllers are active in Network Partition scenario which controller “owns” the Northbound Cluster Leader IP? 4. Define Controller-Wide SUSPEND behavior (how?) 5. On failure Primary controller should be elected (2-node case Secondary is only option to be elected) 6. How/Need to detect management plane failure? (Heartbeat timeout >> w.c. GC?) Open Issues 11 Follow-up Design Discussion Topics

Www.opendaylight.org 2-Node Clustering Active-Standby Deployment.

Similar presentations

Presentation on theme: "Www.opendaylight.org 2-Node Clustering Active-Standby Deployment."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Www.opendaylight.org 2-Node Clustering Active-Standby Deployment.

Similar presentations

Presentation on theme: "Www.opendaylight.org 2-Node Clustering Active-Standby Deployment."— Presentation transcript:

Similar presentations

About project

Feedback