OVN DBs HA with scale test

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

High Availability Deep Dive What’s New in vSphere 5 David Lane, Virtualization Engineer High Point Solutions.
CloudStack Scalability Testing, Development, Results, and Futures Anthony Xu Apache CloudStack contributor.
Technical Aspects of Peering Session 4. Overview Peering checklist/requirements Peering step by step Peering arrangements and options Exercises.
Internet Control Protocols Savera Tanwir. Internet Control Protocols ICMP ARP RARP DHCP.
Support for Windows 7 Chapter 2 Securing and Troubleshooting Windows 7.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Lesson 1: Configuring Network Load Balancing
Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.
Chapter-4 Windows 2000 Professional Win2K Professional provides a very usable interface and was designed for use in the desktop PC. Microsoft server system.
Components of Windows Azure - more detail. Windows Azure Components Windows Azure PaaS ApplicationsWindows Azure Service Model Runtimes.NET 3.5/4, ASP.NET,
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Honeypot and Intrusion Detection System
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Introduction to HP Availability Manager.
Block1 Wrapping Your Nugget Around Distributed Processing.
Sage ACT! 2013 SDK Update Brian P. Mowka March 23, 2012 Template date: October 2010.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration System Monitoring.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
2006 Sonoma Workshop February 6, 2006 Cisco Tbird Experience Page 1 of (#) IB Management Experiences Cisco’s view of Sandia T-Bird bringup Ted Wilcox
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Deploy SDN-IP.
Test and Performance Integration Group.
Windows Server 2003 { First Steps and Administration} Benedikt Riedel MCSE + Messaging
Andrew Lahiff HEP SYSMAN June 2016 Hiding infrastructure problems from users: load balancers at the RAL Tier-1 1.
NetFlow Analyzer Best Practices, Tips, Tricks. Agenda Professional vs Enterprise Edition System Requirements Storage Settings Performance Tuning Configure.
SUSE Linux Enterprise Server for SAP Applications
Shaopeng, Ho Architect of Chinac Group
Answer to Summary Questions
Welcome POS Synchronize Concept 08 Sept 2015.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Layer 3 Redundancy 1. Hot Standby Router Protocol (HSRP)
VSPHERE 6 FOUNDATIONS BETA Study Guide QUESTION ANSWER
Designing the Physical Architecture
Chapter Objectives In this chapter, you will learn:
Failover and High Availability
Troubleshooting Tools
Deploy Containerized OPNFV Cluster Efficiently Using Daisy Installer
N-Tier Architecture.
Module 2: DriveScale architecture and components
AlwaysOn Mirroring, Clustering
Building a Virtual Infrastructure
MCTS Guide to Microsoft Windows 7
Performance Testing Methodology for Cloud Based Applications
CHAPTER 3 Architectures for Distributed Systems
Introduction to Computers
Introduction to Networking
Introduction to client/server architecture
Module 3 Building a web app.
OPNFV Arno Installation & Validation Walk-Through
Meng Cao, Xiangqing Sun, Ziyue Chen May 28th, 2014
Best Practices for Load Balancing Your GlobalSearch Installation
Chapter 2. Malware Analysis in VMs
Dev Test on Windows Azure Solution in a Box
Chapter 17: Database System Architectures
INFO 344 Web Tools And Development
Installing Linux Redhat:
Issues in Client/Server Programming
OVN: The future of Openvswitch
Encrypting OVN tunnels with IPsec
Planning and Storyboarding a Web Site
Discussing an OVS/OVN Split
OVN Controller Incremental Processing
Client/Server Computing and Web Technologies
How to install and manage exchange server 2010 OP Saklani.
Using OpenDaylight in Hybrid Cloud: issues or challenges
Presentation transcript:

OVN DBs HA with scale test Aliasgar Ginwala (aginwala@ebay.com) IRC: aginwala

What components can be improved with scale test? OVN-Controller on computes/GWs – ongoing discussions and WIP upstream OVS-vSwitchd on computes/GWs – performance improved with help of community. OVN-Northd on central nodes – ongoing discussions and WIP upstream

Why scale test? To see how OVN behaves when deployed at scale. Ensure an entire availability zone is simulated fine in big cloud deployments. Find out bugs as early as possible to improvise OVN.

What to use for scale test? OVN Scale test When something fails, performs slowly or doesn't scale, it's really hard to answer different questions on "what", "why" and "where" without a solid scalability testing framework. Since OpenStack rally is very convenient benchmarking tool, OVN scale test leverages the same. It is a plugin of OpenStack Rally. It’s open sourced and maintained under same base project OpenvSwitch. Intended to provide the community with a OVN control plane scalability test tool that is capable of performing specific, complicated and reproducible test cases on simulated scenarios. Need to have a Rally installed, as workflow is also similar to Rally’s. Upstream scale test repo @ https://github.com/openvswitch/ovn-scale-test User guide @ http://ovn-scale-test.readthedocs.org/en/latest/

Rally OVS To run OVN scale test, you don’t need OpenStack installed - instead you just need rally installed. Main keywords : Deployment = any cloud deployment consisting of all network and compute components. Task = Any CRUD operations on compute, farm and network components like lports, lswitches, lrouters, etc. Farm = collection of sandboxes Sandbox = a chassis (hypervisor/compute node/ovs sandbox)

Base counters considered for an availability zone 8 lrouters 5 lswitches per router 250 lports per lswitches Total 10k lports Total Chassis: 1k Total BMs that hosts chassis: 20 Total control plane nodes: 3 10 lports(VM) per chassis OS: Ubuntu 16.04 with 4.4 kernel

OVSdb service models OVSDB supports three service models for databases:  Standalone Active-Backup Clustered The service models provide different compromises among consistency, availability, and partition tolerance. They also differ in the number of servers required and in terms of performance. The standalone and active-backup database service models share one on-disk format, and clustered databases use a different format [1] 1.https://github.com/openvswitch/ovs/blob/80c42f7f218fedd5841aa62d7e9774fc1f9e9b32/Documentation/ref/ovsdb.7.rst

OVN DBs Active-standby using pacemaker Alternatively, this LB VIP can be replaced by: Option 2: BGP advertising the VIP on each node Option 3: put all 3 nodes on same rack and use pacemaker to manage the VIP too. CMS Neutron CMS Active LB VIP Standby Node1 Node2 Node3 Pacemaker Cluster NB NB NB Northd Northd Northd SB SB SB LB VIP Architecture Diagram credits: Han Zhou <hzhou8@ebay.com> ... HV HV HV

Start OVN DBs using pacemaker Let pacemaker manage the VIP resource. Using LB VIP: set listen_on_master_ip_only=no Active node will listen on 0.0.0.0 so that LB VIP IP can connect on respective sb and nb db ports

OVN DBs – Raft Clustering Cluster Leader CMS Neutron CMS Active LB VIP Standby Node1 Node2 Node3 NB NB NB Northd uses OVSDB named lock to ensure only one is active Northd Northd Northd SB SB SB LB VIP Architecture Diagram credits: Han Zhou <hzhou8@ebay.com> ... HV HV HV

Starting OVN DBs using clustering For LB VIP: Set connection table to listen on 0.0.0.0 on all nodes For chassis: Point it to either VIP IP e.g. tcp:<vip_ip>:6642 Or all central node IPs e.g. “tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:192.168.220.103:6642”

How to set up scale test env ? Create deployment which is installing necessary packages/binaries on a BM rally-ovs deployment create --file ovn-multihost.json --name ovn-overlay TOR switch ssh Rally-ovs ssh OVN central node OVN Farm1 . OVN Farm20 ssh

How to set up scale test env ? Rally task start create_sandbox is equivalent to convert the BM into a compute node with ovs installed. rally-ovs task start create_sandbox.farm1.json TOR switch ssh Rally-ovs ssh OVN central node OVN Farm1 HV1 HV2 HV50

How to set up scale test env ? Finally create lrouters, lswitches and lports and also bind the lports to the chassis rally-ovs task start create_routers_bind_ports.json TOR switch ssh Rally-ovs ssh OVN central node OVN Farm1 HV1 HV2 .. lport1 lport20 lport500 HV50

OVN scale test with HA OVN scale test by default sets up one active standalone OVN DB. Hence, we need to separately setup an HA cluster TODO: (support to deploy HA cluster to be added in OVN-scale-test to avoid manual setup) For testing HA, we need to point the chassis to HA nodes setup which can be set to respective OVN DB HA VIP IP in the create_sandbox.json using below param "controller_cidr": "192.168.10.10/16",

Scenarios – Active-standby using pacemaker Impact on Control plane Impact on Data plane Standby node reboot No Active node reboot Yes (~5+ minutes as SB DB is running super hot resyncing the data) Only newly created VMs/lports till SB DB cools down. All active and standby nodes reboot Yes (few minutes depending on how soon is new node up and data sync is finished) No* *Entire NB db data got flushed/lost causing both control and data plane impact *Discussion @ https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html *Fixed rolled out with help of upstream and no issues reported so far. *Commit ecf44dd3b26904edf480ada1c72a22fadb6b1825

OVN DBs HA – Active-backup with pacemaker Current status Basic functionality tested Scale testing always ongoing with findings reported and some major issues fixed with help of upstream. Detailed scale test scenarios reported and also updated on mail chain to the community https://mail.openvswitch.org/pipermail/ovs-discuss/2018-September/047405.html Consent and improvements asked to upstream folks

Scenarios – Clustered DBs Impact on Control plane Impact on Data plane Any active node reboot No All active nodes reboot Yes (few minutes depending on how soon is new node up along with leader selection and data sync completion) Not fully verified

Raft with scale test summary Current status Basic functionality tested. Scale testing ongoing and problems found when using rally-ovs (ovn scale test) with around 2k+ lports db=\"tcp:192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641\" -- wait-until Logical_Switch_Port lport_061655_SKbDHz up=true -- wait-until Logical_Switch_Port lport_061655_zx9LXe up=true -- wait-until Logical_Switch_Port Last stderr data: 'ovn-nbctl: tcp:192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641: database connection failed (End of file)\n'.", "traceback": "Traceback (most recent call last):\n File \"/ebay/home/aginwala/rally-repo/rally/rally/task/runner.py\", line 66, in _run_scenario_once\n Following up with community to get it fixed soon with discussions @ https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347260.html Upstream also have raft torture test in test cases in ovs repo for testing locally.

Some tunings for both clustered and non clustered setup Netfilter TCP params on all central nodes:  Since tcp_max_syn_backlog and net.core.somaxconn values are too small, we need to increase the value to avoid getting TCP sync flood messages in syslog: net.ipv4.tcp_max_syn_backlog = 4096 net.core.somaxconn = 4096 Pacemaker configurations When the SB DB starts on the new active node, it will be very busy on syncing data to all HVs. During this time, pacemaker monitoring can get timed out. Because of this, the timeout value for "op monitor" needs to be set big enough to avoid timeout to avoid restart/failover forever. Hence, configure pacemaker monitor for resource ovndb-servers: op monitor interval=60s timeout=50s Inactivity probe settings on all chassis Set inactivity probe to 3min, so that central SB DB won't get overloaded for probe handling and also if failover happens, chassis will be able to notice the changes Upstart settings on all central nodes when using pacemaker: Disable ovn-central and openvswitch-switch upstart to avoid confusing pacemaker when node reboots because pacemaker thinks there is already an active pid and all the nodes will act as standalone nodes. Also LB gets confused sending traffic to this standby node.

Promising outcome and more to go OVS-vswitchd CPU utilization was running super high on chassis. Performance improved by making ofproto faster and results are amazing; test completed in 3+ hours vs 8+ hours: Discussion @ https://mail.openvswitch.org/pipermail/ovs-discuss/2018-February/046140.html Commit c381bca52f629f3d35f00471dcd10cba1a9a3d99

CPU/Mem stats for active-standby Active Central node Chassis Components CPU Mem OVN NB DB 0.12 97392000 OVN SB DB 0.92 777028000 OVN Northd 6.78 825836000 Components CPU Mem OVSDB server 0.02 11672000 OVS-vSwitchd 3.75 152812000 OVN-controller 0.94 839188000 Note: Mem: RES mem in bytes whether its mb, gb or tb. CPU: total CPU time, the task has used since it started. e.g. if the total cpu time in seconds for a current ovn-controller process is 6:26.90, we convert the same into integer seconds by following time conversion formula: 6 * 6000 + 26 * 100 + 90 = 38690 Converted in Delta (speed per second)

Stuck? Reach out to OVS community as it’s super interactive and responsive. For any generic OVS queries/tech discussions use ovs-discuss@openvswitch.org so that wide variety of engineers can respond for the same.

Thank You