Shashi KANt singh DPDK Summit Bangalore

Slides:



Advertisements
Similar presentations
2  Industry trends and challenges  Windows Server 2012: Beyond virtualization  Complete virtualization platform  Improved scalability and performance.
Advertisements

Virtualization and Cloud Computing. Definition Virtualization is the ability to run multiple operating systems on a single physical system and share the.
Keith Wiles DPACC vNF Overview and Proposed methods Keith Wiles – v0.5.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Virtualization for Cloud Computing
VMware vSphere 4 Introduction. Agenda VMware vSphere Virtualization Technology vMotion Storage vMotion Snapshot High Availability DRS Resource Pools Monitoring.
Basic Concepts of Computer Networks
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.
Virtual Machine Course Rofideh Hadighi University of Science and Technology of Mazandaran, 31 Dec 2009.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
Improving Network I/O Virtualization for Cloud Computing.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Server Virtualization
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets US-North Central Region FC TOR PDU Servers TOR PDU Servers TOR PDU Servers TOR PDU.
IT Pro Day Windows Server 2012 Hyper-V – The next chapter Michel Luescher, Senior Consultant Microsoft Thomas Roettinger, Program Manager Microsoft.
Hyper-V Performance, Scale & Architecture Changes Benjamin Armstrong Senior Program Manager Lead Microsoft Corporation VIR413.
VMware vSphere Configuration and Management v6
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing.
Full and Para Virtualization
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
IT Pro Day Windows Server 2012 Hyper-V – The next chapter Michel Luescher, Senior Consultant Microsoft Thomas Roettinger, Program Manager Microsoft.
Brian Lauge Pedersen Senior DataCenter Technology Specialist Microsoft Danmark.
PHD Virtual Technologies “Reader’s Choice” Preferred product.
Md Baitul Al Sadi, Isaac J. Cushman, Lei Chen, Rami J. Haddad
New Approach to OVS Datapath Performance
BESS: A Virtual Switch Tailored for NFV
VPN Extension Requirements for Private Clouds
Is Virtualization ready for End-to-End Application Performance?
Workload Distribution Architecture
Architecture and Algorithms for an IEEE 802
Current Generation Hypervisor Type 1 Type 2.
MOBILE NETWORKS DISASTER RECOVERY USING SDN-NFV
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
6WIND MWC IPsec Demo Scalable Virtual IPsec Aggregation with DPDK for Road Warriors and Branch Offices Changed original subtitle. Original subtitle:
Chapter 1: Introduction
Are You Insured Against Your Noisy Neighbor - A VSPERF Use Case
Parallel Algorithm Design
AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.
Storage Virtualization
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Casablanca Platform Enhancements to Support 5G Use Case (Network Deployment, Slicing, Network Optimization and Automation Framework) 5G Use Case Team.
Management and Orchestration in Complex and Dynamic Environment
Virtualization Techniques
Casablanca Platform Enhancements to Support 5G Use Case (Network Deployment, Slicing, Network Optimization and Automation Framework) 5G Use Case Team.
Casablanca Platform Enhancements to Support 5G Use Case (Network Deployment, Slicing, Network Optimization and Automation Framework) 5G Use Case Team.
INFO 344 Web Tools And Development
Specialized Cloud Mechanisms
Casablanca Platform Enhancements to Support 5G Use Case (Network Deployment, Slicing, Network Optimization and Automation Framework) 5G Use Case Team.
Open vSwitch HW offload over DPDK
Cloud computing mechanisms
Virtio/Vhost Status Quo and Near-term Plan
Casablanca Platform Enhancements to Support 5G Use Case (Network Deployment, Slicing, Network Optimization and Automation Framework) 5G Use Case Team.
Building continuously available systems with Hyper-V
Cloud Computing Architecture
Specialized Cloud Architectures
Windows Virtual PC / Hyper-V
Co-designed Virtual Machines for Reliable Computer Systems
Database System Architectures
Virtualization Dr. S. R. Ahmed.
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
Cache writes and examples
Latest Update on Gap Analysis of Openstack for DPACC
Interrupt Message Store
Presentation transcript:

Shashi KANt singh DPDK Summit Bangalore - 2018 Optimal VM Dimensioning for Data Plane VNFs in Local / Edge Telco Cloud Shashi KANt singh DPDK Summit Bangalore - 2018

Agenda Telco Cloud Performance Requirements aligned with Deployments Data Plane VNF Lifecycle Requirements Typical Data Plane VNF Types Issues with Data Plane VNF Dimensioning Possible Solutions

National / Central Cloud Telco DCs in Cloud Regional Cloud (RC) National / Central Cloud Local Cloud (LC) Edge Cloud (EC) Typical Telco applications are distributed in nature primarily driven by performance needs. These applications may need to be run in different DCs but still be chained to provide a service Key Telco Requirements: Efficient service chaining by removing local performance bottlenecks Multi-tenancy support with network slicing Separation of Management, Control and Data Planes High Capacity and High Bandwidth Ultra Reliability and Low Latency National Cloud Multi-Tenancy, High Availability , High Capacity, Scalability, Load Balancing, Energy Saving / Resource Utilization Non Real-Time Performance Regional Cloud Fault Resilience, Scalability, High Throughput, High Capacity, Multi-Tenancy Local Cloud Low Latency, High Throughput, Fault Resilience Real-Time Performance

Typical Telco Cloud Regional Cloud: Local / Edge Cloud: National Cloud: Data Plane VNFs are typically L2-L4 Routing, Firewall, Security Gateways, Application Gateways Control Plane VNFs are signalling GWs e.g. IMS servers , MME etc. Generic VNF flavours are good enough for Control / Data Plane VNFs with help of load balancers to even out the processing requirements Majority of data traffic processed is agnostic to subscriber context Line rate traffic handling easily achieved even with 40G / 100G due to: Nature of applications (switching , routing) DPDK/VPP based vSwitch works well without need of HW binding With HW independent virtualization, Fault Management is done effectivity Live Migration (LM) Independent high capacity Infrastructure link can be used to support hitless migrations i.e. without service disruption Link Aggregation Pooled VNFs with load balancers ACT-ACT, ACT-SBY configurations possible with hitless / minimal impacted check pointing BBU-RT PGW-CP Edge Cloud BBU-NRT Regional Cloud SGW-CP Local Cloud PGW-DP Edge Cloud BBU-RT PDN BBU-NRT National Data Centre SGW-DP Local Cloud Edge Cloud BBU-RT PGW-CP IMS MME Regional Cloud SGW-CP BBU-NRT Local Cloud BBU-RT PGW-DP Edge Cloud SGW-DP Regional Cloud: Data Processing based on Subscriber context. Data traffic forwarding depends on the next leg conditions e.g. radio conditions, Front-haul / Mid-haul link stability etc. These leads to controlled processing in data plane BUT still need to meet the max throughput requirements in good channel conditions. To handle varying throughput, control code in data plane is required. Manage the subscriber context data and update the channel conditions dynamically Use Channel Condition data to control the flow of traffic in data plane Due to high bandwidth requirement along with the control code, data plane VNF perform sub- optimally Split of Control and Data VNFs is typically done Control Plane VNFs tend to be CPU Intensive Data Plane VNFs become IO Intensive Local / Edge Cloud: Data Plane processing is further split into: Non Real Time Data Plane VNFs (e.g. BBU-NRT) These are IO Intensive VNFs Latency is not a critical performance parameter Don’t not typically need HW binding Real Time Data Plane VNFs (e.g. BBU-RT) Latency is most critical performance parameter Mostly needs HW binding to meet performance requirements

Data Plane VNF Lifecycle Requirements Performance Maintain/exceed stringent service availability and real-time performance requirements Government and regulatory requirement: must be at least 99.999 percent available (5 nines availability) with the ability to continue to operate and maintain services even with a full nodal failure Fault Resilience Critical ACT-ACT / ACT-SBY HA Active Live Migration Passive Cloning, Snapshots Backup / Restore Scalability Static Manual request by Operator to increase network capacity Dynamic (Auto-Scaling) Based on CPU / Network / Memory Utilization Based on Application KPIs monitoring e.g. Throughput, Latency Energy Saving Based on Time of Day Management request of shutting down nodes Dynamic Condense Scaled Down VNFs to fewer compute nodes Re-arrange VNFs “VNFs must be built to handle failures. Fault tolerance must be considered at top of list during the design of VNFs along with Performance “ OPNFV Directions: ACT-SBY configuration (Under critical error conditions) Preventive action by fault prediction (Read warnings) Active Fault Management VM Retirements for regular health check-ups Passive Fault Management

Typical Data Plane VNF Types / Flavors General Purpose CPU Intensive IO Intensive without HW Binding IO Intensive with HW Binding Fault Resiliency Performance Generic VMs with no specific asks for HW resources like CPU, Network port, Accelerators. Have specific ask for CPU cores with low or medium data traffic entering / leaving the VNF May need dedicated CPU allocation High Rate of Data traffic entering / leaving VNF Latency not a critical performance requirement Use HW binding CPU Pinning IRQ Remapping NUMA awareness for IO devices Cache Coherence Performance requirements well without reach with generic VNF favours. Need dedicated CPU allocation for IO performance requirements Specific ask for IO and CPU resources SR-IOV, PCI-PT Use HW accelerators (e.g. Crypto) NUMA awareness for the CPU, RAM alone (without NIC proximity) can also maintained by VNFM/Orchestrator Designed to meet all the VNF Lifecycle Requirements Horizontal and Vertical Scaling takes care of High Capacity needs Typically they could use: OVS-DPDK / VPP at host DPDK-PMD / VPP inside VM Typically they could use: OVS / OVS-DPDK at host Virtio inside VM Could be: RAN Data plane processors for non-real time traffic Packet processors like DPI, Firewall, Fault Resiliency is provided by HA / Redundancy. High rate of data traffic entering and leaving the VNFs and additionally: Latency Sensitivity is critical parameter Typically they use: Shared CPU allocation which enables efficient sharing of HW resources OVS as software switch in Host Compute and generic virtio as network interface in VM Live Migration supported with some limitations Parallel live migrations How fat is the VM (capacity of cpu processing / rate of dirtying pages). Without load balancers, scalability options are limited Scalability not a critical parameter Live Migration fully supported Live Migration are typically not supported Live Migration are not supported

Issues with VNF Dimensioning CPU and IO Intensive VNFs may not be able to meet all the VNF Lifecycle requirements. Common Issues: CPU Allocation Shared vs Dedicated Shared will give high multiplexing and overall CPU utilization Dedicated CPUs will ensure CPU availability all the time With Hyper Threaded physical CPUs, virtual CPUs may not give 2x performance for extensive data processing applications e.g. AI Openstack thread allocation supports Avoid, Separate, Isolate, Prefer. Live Migration may not be possible with some of these allocation types. Trade-off against the minimum required RAM / Disk Higher RAM / Disk makes VM FAT Impact on Live Migration Performance Lower RAM/Disk may impact the CPU performance with Cache misses Trade-off against the page size allocated Smaller size leads to more TLB misses Larger size increases Unused page sections per process (internal fragmentation) Page relocation options hence page faults

Issues with VNF Dimensioning CPU Intensive VMs: Dedicated CPUs allocated to VMs may not be efficiently used (as in shared CPU allocation). CPUs are typically allocated considering the max load handling capacity of the VNF and in case of low load conditions, CPUs are not optimally used nor they are shared with other VNFs Multiplexing gain limitations as the major concern Openstack provides Overcommitting of CPU /RAM to increase the effective CPU utilization by sharing instances. This is not possible with dedicated CPU allocation May have issues if multiple Live Migrations are performed at the same time Dedicated Infrastructure link may not be possible in Regional / Local Cloud IO Intensive VMs: HW binding for high performance makes it static. This is more like a VM acting like as a RECONFIGURABLE PHYSICAL MACHINE. VM Dimension cannot be changed easily Due to static configuration, network slicing also becomes difficult HW binding makes the VM not portable on a COTS hardware VM has to fit within single NUMA mode hence placement of FAT VMs is also challenging. Local SR-IOV NICs on NUMA node need to be available for VM relocation. Availability of SR-IOV VFs on same NUMA node where CPU resources are available for the migrating VM VM cannot be condensed to reduced set of compute nodes for energy saving Mostly Live Migration not supported Due to high rate of dirtying page SR-IOV not supporting Live Migration HW binding Fault Resiliency is difficult without Live Migration. HA ACT-SBY configuration can be used at the cost of duplicate HW requirements which is not cost effective

Solution - General Customize Guest OS Define Upper bound of Performance Expectation of a VM Split functionality of VNF in multiple VMs is possible Find out of the resources / flavour definition JUST sufficient to meet the Performance Expectation Start with General Profile, add HW independent resources If Performance if met then don’t need to look for more sophisticated solution Try to use Shared resource before looking for Dedicated Resources If Dedicated resources are only option then select ones which support Live Migration Identity Fault Resilience procedure for the VM to the extent possible without compromising on Performance Define the Scalability Options/Procedures Define the Energy Saving Options

Solution - General Size of the VM (CPU, RAM, Network Ports, Disk) has bearing on the following: Live Migration System Backup / Restore Instantiation / Deletion VM Operations e.g. Suspend, Resume, Pause, Un-pause Bigger the size of VM, higher the above response time Smaller size of VMs, tend to be more responsive but below some range, performance requirements of the VM may not be met Optimal VM dimension for CPU, RAM, Network IO and Disk need to be identified General Solutions: Strip Guest OS to keep only the desired services Guest OS resource usage should be kept < 20% Separate out CPU resources used by Guest OS within the VM CPU isolation for Apps IRQ remapping to specific cores

Solution – CPU Intensive VMs Allocate CPUs in mixed mode: Shared + Dedicated Allow for Vertical Scaling of CPUs Avoid PMD mode if possible if the network IO is manageable in interrupt mode Decide on the optimal RAM / Disk size required based on the VNF application Disk operations can be reduced by using network IO to a storage node Volume based VNFs (with dedicated storage nodes) helps in recovery from Faults e.g. using LM Define the optimal page size requirement based on the application Let Virtualization platform reduce the CPU cycles of a VM during LM. This reduces the rate of dirtying page and allows multiple LMs to pass without higher probability Energy Saving is possible with increased usage of shared resources With multiple VNFs scaling down, it should be possible to put them in low CPU cycle mode and even condense VNFs in fewer compute nodes.

Solution – IO Intensive VMs Use 1G Huge Page size over 4M Use PMD mode over interrupt mode Reduce the Latency of interrupt handling Higher packet IO performance Use Ring buffer (non-blocking) for higher packet processing rate Use Vector Packet Processing Techniques (Concept of VPP) To support LM use XVIO based network interface instead of SR-IOV Netronome has come up with smartNIC cards supporting XVIO standards This allows full VM mobility by providing standard virtio interfaces in the VM With VM CPUs getting freed up from PMD (offloaded to XVIO), it is possible to handle traffic at sub-line rate. Instead of having 1 FAT VM processing data at line rate with x vCPU and y Memory, it can be split in 3 smaller VMs. This would allow Live Migration to go through. App assisted LM: Virtualization infrastructure provides a notification to app about LM being initiated. This allows app to reduce the data traffic handling thereby reducing the rate of dirtying pages Virtualization infrastructure can also reduce CPU cycle rate of VM

THANK YOU (singhshashik1@gmail.com)