Active-Standby Deployment

Slides:



Advertisements
Similar presentations
High Availability Deep Dive What’s New in vSphere 5 David Lane, Virtualization Engineer High Point Solutions.
Advertisements

Part 2: Preventing Loops in the Network
LACP Project Proposal.
Copyright © 2014 EMC Corporation. All Rights Reserved. Basic Network Configuration for File Upon completion of this module, you should be able to: Configure.
Mecanismos de alta disponibilidad con Microsoft SQL Server 2008 Por: ISC Lenin López Fernández de Lara.
© 2011 VMware Inc. All rights reserved High Availability Module 7.
High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation.
® IBM Software Group © 2006 IBM Corporation Rational Software France Object-Oriented Analysis and Design with UML2 and Rational Software Modeler 04. Other.
Radko Zhelev, IPP BAS Generic Resource Framework for Cloud Systems 1 Generic Resource Framework for Cloud Systems.
CS 582 / CMPE 481 Distributed Systems
Scalable Application Layer Multicast Suman Banerjee Bobby Bhattacharjee Christopher Kommareddy ACM SIGCOMM Computer Communication Review, Proceedings of.
Hands-On Microsoft Windows Server 2003 Administration Chapter 5 Administering File Resources.
Administering Active Directory
Hands-On Microsoft Windows Server 2003 Administration Chapter 3 Administering Active Directory.
Lesson 1: Configuring Network Load Balancing
COS 461: Computer Networks
2-Node Clustering Active-Standby Deployment.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # ) Chapter Ten Configuring Windows Server 2008 for High.
Virtual LANs. VLAN introduction VLANs logically segment switched networks based on the functions, project teams, or applications of the organization regardless.
Chapter 7 Configuring & Managing Distributed File System
ADVANCED MICROSOFT ACTIVE DIRECTORY CONCEPTS
NDIS LBFO Miniports (Load Balancing And Failover) Larry Cleeton Program Manager Windows Networking And Communications Microsoft Corporation.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Chapter 13 Starting Design: Logical Architecture and UML Package Diagrams.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Hybrid Overlay Multicast Framework draft-irtf-sam-hybrid-overlay-framework-02.txt John Buford, Avaya Labs Research IETF 71.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Course 301 – Secured Network Deployment and IPSec VPN
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 3 v3.0 Module 9 Virtual Trunking Protocol.
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
1 © 2005 Cisco Systems, Inc. All rights reserved. 111 © 2004, Cisco Systems, Inc. All rights reserved. CNIT 221 Security 2 ver.2 Module 8 City College.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Geo-distributed Messaging with RabbitMQ
Improving Robustness in Distributed Systems Per Bergqvist Erlang User Conference 2001 (courtesy CellPoint Systems AB)
1 Version 3.0 Module 7 Spanning Tree Protocol. 2 Version 3.0 Redundancy Redundancy in a network is needed in case there is loss of connectivity in one.
Feb. 9, 2004CS WPI1 CS 509 Design of Software Systems Lecture #4 Monday, Feb. 9, 2004.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Introduction to Active Directory
The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.
Clustering in OpenDaylight
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Deploy SDN-IP.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
Test and Performance Integration Group.
Atrium Router Project Proposal Subhas Mondal, Manoj Nair, Subhash Singh.
Detour: Distributed Systems Techniques
SDN controllers App Network elements has two components: OpenFlow client, forwarding hardware with flow tables. The SDN controller must implement the network.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
OpenDaylight Clustering – What’s new in Boron
REPLICATION & LOAD BALANCING
High Availability 24 hours a day, 7 days a week, 365 days a year…
AlwaysOn Mirroring, Clustering
Data Transport for Online & Offline Processing
NOX: Towards an Operating System for Networks
Cluster Communications
Key Terms Windows 2008 Network Infrastructure Confiuguration Lesson 6
SDN Overview for UCAR IT meeting 19-March-2014
Replication Middleware for Cloud Based Storage Service
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Yang Zhang, Eman Ramadan, Hesham Mekky, Zhi-Li Zhang
Starting Design: Logical Architecture and UML Package Diagrams
Designing Database Solutions for SQL Server
Presentation transcript:

Active-Standby Deployment 2-Node Clustering Active-Standby Deployment

2-Node Deployment Topology Active-Standby Requirements Requirements Configuration of Primary controller in cluster (Must) Primary Controller services the Northbound IP address, a Secondary takes over NB IP upon failover (Must) Configuration of whether on failover & recovery, configured Primary controller reasserts leadership (Must) Configuration of merge strategy on failover & recovery (Want) Primary controller is master of all devices and is leader of all shards (Must) Initial Config (design to allow for alternatives – multi-shard / multiple device masters) Single node operation allowed (access to datastore on non-quorum) (Want)

Scenario 1: Master Stays Offline Failure of Primary Scenario 1: Master Stays Offline Failover Sequence Secondary controller becomes master of all devices and leader of all shards

Scenario 2: Primary Comes Back Online Failure of Primary Scenario 2: Primary Comes Back Online Recovery Sequence Controller A comes back online and its data is replaced by all of Controller B’s data For Re-assert leadership configuration: (ON) Controller A becomes master of all devices and leader of all shards (OFF) Controller B stays master of all devices and maintains leadership of all shards

Scenario 1: During Network Partition Failover Sequence Controller A becomes master of devices in its network segment and leader of all shards Controller B becomes master of devices in its network segment and leader of all shards

Scenario 2: Network Partition Recovers Recovery Sequence Merge data according to pluggable merge strategy (Default: Secondary’s data replaced with Primary’s data.) For Re-assert leadership configuration: (ON) Controller A becomes master of all devices and leader of all shards again. (OFF) Controller B becomes master of all devices and leader of all shards again

Failures That Do Not Result in Any Role Changes No-Op Failures Failures That Do Not Result in Any Role Changes Scenarios Secondary controller failure. Any single link failure. Secondary controller loses network connectivity (but device connections to Primary maintained)

Cluster Configuration Options Global & Granular Configuration Global Cluster Leader (aka “Primary”) Allow this to be changed on live system, e.g. maintenance. Assigned (2-Node Case), Elected (Larger Cluster Case) Cluster Leader Northbound IP Reassert Leadership on Failover and Recovery Network Partition Detection Alg. (pluggable) Global Overrides of Per Device/Group and Per Shard items (below) Per Device / Group Master / Slave Per Shard Shard Leader (Shard Placement Strategy – pluggable) Shard Data Merge (Shard Merge Strategy – pluggable)

HA Deployment Scenarios Simplified Global HA Settings Can we Abstract Configurations to Admin-Defined Deployment Scenarios? e.g. Admin Configures 2-Node (Active-Standby): This means Primary controller is master of all devices and leader of all shards. Conflicting configurations are overridden by deployment scenario.

Implementation Dependencies Potential Changes to Other ODL Projects Clustering: Refactoring of Raft Actor vs. 2-Node Raft Actor code. Define Cluster Leader Define Northbound Cluster Leader IP Alias OpenFlow Plugin: OpenFlow Master/Slave Roles Grouping of Master/Slave Roles (aka “Regions”) System: Be Able to SUSPEND the Secondary controller to support Standby mode.

Follow-up Design Discussion Topics Open Issues Follow-up Design Discussion Topics TBD: Is Master/Slave definition too tied to OpenFlow? (Generalize?) Should device ownership/mastership be implemented by OF Plugin? How to define Northbound Cluster Leader IP in a platform independent way? (Linux/Mac OSx: IP Alias, Windows: Possible) Gratuitous ARP on Leader Change. When both Controllers are active in Network Partition scenario which controller “owns” the Northbound Cluster Leader IP? Define Controller-Wide SUSPEND behavior (how?) On failure Primary controller should be elected (2-node case Secondary is only option to be elected) How/Need to detect management plane failure? (Heartbeat timeout >> w.c. GC?)

Implementation (DRAFT)

Change Summary Cluster Primary: (OF Master & Shard Leader) Northbound IP Address (Config) Define Northbound IP Alias Address (Logic) <Pluggable> Northbound IP Alias Implementation (Platform Dependent) Behavior (Config / Logic) <Pluggable> Define Default Primary Controller Assigned (Configuration) – Default for 2-Node Calculated (Election Algorithm) Redefine Default Primary Controller on Running Clustering (Logic) Control OF Master Role (Logic) Control Datastore Shards Global Config (Overridden) Shard Placement (On Primary) <Pluggable> Leadership Determination Match OF Master – Default for 2-Node Election Based (With Influence)

Change Summary (Continued) Cluster Primary: (OF Master & Shard Leader) Behavior (Continued) Network Partition & Failure Detection (Config / Logic) <Pluggable> Detection Algorithm – Default: Akka Clustering Alg. Failover (Config / Logic) <Pluggable> Secondary Controller Behavior (Logic) Suspend (Dependent APP, Datastore, etc.) (Logic) Resume (Become Primary) (OF Mastership, Shards Leader, Non-Quorum Datastore Access) Failback (Logic) <Pluggable> Data Merge Strategy – Default: Current Primary Overrides Secondary (Config) Primary Re-Asserts Leadership on Failback (OF Master & Shard Leader Roles – After Merge)

Dependencies Southbound System Suspend Behavior Config Subsystem Device Ownership & Roles System Suspend Behavior How to Enforce System-Wide Suspend When Desired? (Config Subsystem? OSGI?) Config Subsystem Resolving APP Data Notifications? Measure Failover Times No Data Exchange Various Data Exchange Cases (Sizes)

RAFT/Sharding Changes (DRAFT)

(Current) Shard Design ShardManager is an actor who does the following Creates all local shard replicas on a given cluster node and maintains the shard information Monitor the cluster members, their status, and stores their addresses Finds local shards Shard is an actor (instance of RaftActor) which represents a sub-tree within data store Uses in-memory data store Handles requests from Three phase commit Cohorts Handles the data change listener requests and notifies the listeners upon state change Responsible for data replication among the shard (data sub-tree) replicas. Shard uses RaftActorBehavior for two tasks Leader Election for a given shard Data Replication RaftActorBehavior can be in any of the following roles at any given point of time Leader Follower Candidate

(Current) Shard Class Diagram

(Proposed) Shard Design Intent Support two-node cluster by separating shard data replication from Leader election Elect one of the ODL node “master” and mark that as “Leader” for all the shards Make Leader Election Pluggable Current Raft Leader Election logic should work for 3-node deployment Design Idea Minimize the impact on “ShardManager” and “Shard” Separate ‘leader election’ and ‘data replication’ logic with ‘RaftActorBehavior’ classes. Create two separate abstract classes and interfaces for ‘leader election’ and ‘data replication’ Shard actor will contain reference to ‘RaftReplicatonActorBehavior’ instances (currentBehavior). ‘RaftReplicationActorBehavior’ will contain reference to ‘ElectionActorBehavior’ instance. Both ‘RaftReplicationActorBehavior’ and ‘ElectionActorBehavior’ instances will be in one of the roles at any given point of time Leader Follower Candidate “RaftReplicationActorBehavior” will update it’s “ElectionActorBehavior” instance based on message received. The message could be sent either by one of the “ElectionActorBehavior” instance or a module that implement “2-node cluster” logic.

(Proposed) Shard Class Diagram

2-node cluster work flow Method-1: Run 2-node cluster protocol outside of ODL External cluster protocol decides which node is ‘master’ and which node is ‘standby’. Once the master election is complete, master sends node roles and node membership information to all the ODL instances. ‘Cluster module’ within ODL defines ‘cluster node’ model and provides REST APIs to configure the cluster information by modifying the *.conf files. ‘Cluster module’ will send RAFT messages to all other the cluster members about cluster information – membership & shard RAFT state. ‘ShardActors’ in both the cluster nodes will handle these messages, and instantiate corresponding “replication Behavior” & “election Behavior” role instances and switch to new roles. Northbound virtual IP is OS dependent and out of scope here.

Reference diagram for Method-2 1a. Switch to controller connectivity state polling Cluster protocol - Primary path 1b. Cluster protocol – Secondary path

2-node cluster work flow Method-2: Run cluster protocol within ODL ‘Cluster Module’ within each ODL instance, talks to other ODL instance and elects the ‘master’ and ‘standby’ nodes. If cluster times out, a node will check other factors (probably cross-check with connected ‘open flow’ switches for ‘primary’ controller information or use alternative path) for new master election. ‘Cluster module’ will send RAFT messages to all other the cluster members about ‘cluster information’ – membership & shard RAFT state. ‘ShardActors’ in both the cluster nodes will handle these messages, and instantiates corresponding “replication Behavior” & “election Behavior” role instances and switch to new roles. Northbound virtual IP is OS dependent and out of scope here.

3-node Cluster work flow Shard Manager will create the local shards based on the shard configuration. Each shard will start of as ‘candidate’ for role election and as well as for ‘data replication’ messages, by instantiating the ‘ElectionBehavior’ and ‘ReplicationBehavior’ classes in ‘Candidate’ roles. Candidate node will start sending ‘requestForVote’ messages to other members. Leader is elected based on ‘Raft leader election’ algorithm and each shard will set it’s state to ‘Leader’ by switching the ‘ElectionBehavior’ & ‘ReplicationBehavior’ instances to Leader. Remaining candidates, receive the leader assertion messages, they will move to ‘Follower’ state by switching to ‘ElectionBehavior’ & ‘ReplicationBehavior’ instances to ‘Follower’

(Working Proposal) ConsensusStrategy Provide Hooks to Influence Key RAFT Decisions (Shard Leader Election / Data Replication) https://git.opendaylight.org/gerrit/#/c/12588/

Config Changes (DRAFT)

(Current) Config Config Files (Karaf: /config/initial) Read Once on Startup (Default Settings For New Modules) (sal-clustering-commons) Hosts Akka & Config Subsystem Reader/Resolver/Validator Currently No Config Subsystem Config Properties Defined? Akka/Cluster Config: (akka.conf) Akka-Specific Settings (actorspaces data/rpc, mailbox, logging, serializers, etc.) Cluster Config (IPs, names, network parameters) Shard Config: (modules.conf, modules-shards.config) Shard Name / Namespace Sharding Strategies Replication (# and Location) Default Config

(Proposal) Config Intent Continue to Keep Config Outside of Shard/RAFT/DistributedDatastore Code Provide Sensible Defaults and Validate Settings When Possible Error/Warn on Any Changes That Are Not Allowed On a Running System Provide REST Config Access (where appropriate) Design Idea Host Configuration Settings in Config Subsystem Investigate Using Karaf Cellar To Distribute Common Cluster-Wide Config Move Current Config Processing (org.opendaylight.controller.cluster.common.actor) to existing sal-clustering-config? Akka-Specific Config: Make Most of Existing akka.conf File as Default Settings Separate Cluster Member Config (see Cluster Config) Options: Provide Specific Named APIs, e.g. setTCPPort() Allow Akka <type,value> Config To Be Set Directly

(Proposal) Config Design Idea (Continued) Cluster Config: Provide a Single Point For Configuring A Cluster Feeds Back to Akka-Specific Settings, etc. Define Northbound Cluster IP Config (alias) Shard Config: Define Shard Config (Name / Namespace / Sharding Strategy) Will NOT Support Changing Running Shard For Now ‘Other’ Config: 2-Node: Designate Cluster’s Primary Node or Election Algorithm (dynamic) Failback to Primary Node (dynamic) Strategies (Influence These in RAFT) – Separate Bundles? Election Consensus

Northbound IP Alias (DRAFT)