NetCloud Hong Kong 2017/12/11 NetCloud Hong Kong 2017/12/11 PA-Flow: Gradual Packet Aggregation at Virtual Network I/O for Efficient Service Chaining Yuki Taguchi†, Ryota Kawashima†, Hiroki Nakayama††, Tsunemasa Hayashi††, and Hiroshi Matsuo† † Nagoya Institute of Technology, Japan †† BOSCO Technologies Inc.
Boosting per-flow performance of service chains The Goal Boosting per-flow performance of service chains VNF NFV-node VM or Container Forwarding performance of NFV-nodes is poor. Virtual Network Function VNF1 VNF2 VNF4 VNF3 VNF1 PA-Flow VNF3 PA-Flow VNF3 VNF4 PA-Flow VNF2 PA-Flow overloaded
Contents Background Related Work Proposal Evaluation Conclusion 1 Background Problems of Service Chaining Related Work Scaling-out Scaling-up Proposal Evaluation Conclusion 2 3 4 5
Current Status of NFV Softwarized devices degrade the performance NFV services have been launched Service chaining is a key concept of NFV Toy-blocking style service chain vFirewall vDPI vGateway Softwarization of devices Softwarized devices degrade the performance Network-Level problems Node-Level problems
Network-Level Problems Server 1st VNF bottleneck 2nd VNF Server 3rd VNF Server Server Server Another network Server Server Server upper-stream The performance of the upper VNFs must be ensured
Node-Level Problems Performance degradation factors VM VNF Bottleneck! vNetwork I/O Host Bottleneck! Dropped! vSwitch too many packets NFV-node Performance degradation factors Flow matching by virtual switches Packet copy and queueing at vNetwork I/O
Performance of NFV-nodes Huge gap ! Small gap “Enlarging packet size” is the key! † R. Kawashima et al., “Evaluation of Forwarding Efficiency in NFV-nodes toward Predictable Service Chain Performance,” IEEE Trans. on Network and Service Management, 2017.
Contents Background Related Work Proposal Evaluation Conclusion 1 Background Problems of Service Chaining Related Work Scaling-out Scaling-up Proposal Evaluation Conclusion 2 3 4 5
Related Work Scaling-Out Scaling-Up 1. Auto Scaling 2. NIC Offloading vSwitch SmartNIC (FPGA inside) On-demand instantiation VNF VNF VNF 3. Packet Aggregation Small packets OpenFlow SW A single large packet
1. Auto-Scaling Physical resources are limited new instance VNF VNF overloaded! VNF healthy healthy healthy instantiation SDN Controller flow rule Physical resources are limited There is no de-facto framework for monitoring † Shoumik Palkar et al., "E2: A Framework for NFV Applications" ACM Symposium on Operating Systems Principles 2015 (SOSP' 15)
Low Latency & High Throughput 2. NIC Offloading Smart NIC† Host Offloading Flow Table Low Latency & High Throughput Smart NIC with FPGA It only works with a specific vSwitch (e.g., Open vSwitch) Performance gain is limited † Agilio OVS Software, combind with Agilio SmartNICs https://www.netronome.com/products/agilio-software/agilio-ovs-software/
3. Aggregation Approach Aggregation between VMs and the Host† VM Host VNF Extra queueing latency is added VM Aggregation Packet Batch Packet Queue No. outgoing packets is not reduced Host Disaggregation DPDK is not supported † M. Bourguiba et al., “Improving Network I/O Virtualization for Cloud Computing“ IEEE Trans. on Parallel and Distributed Systems, 2014
Contents Background Related Work Proposal Evaluation Conclusion 1 Background Problems of Service Chaining Related Work Scaling-out Scaling-up Proposal Evaluation Conclusion 2 3 4 5
(Packet Aggregation Flow) Proposal PA-Flow (Packet Aggregation Flow) Key points 1. Fast and Network-aware Aggregation 2. Gradual Packet Aggregation 3. Next-hop-aware Aggregation
1. Fast and Network-aware Aggregation VNF No extra queueing latency! VM PA-Flow explained later PA-Flow header Outgoing packets are reduced Host PA-Flow header VXLAN DPDK is supported All the existing problems are solved!
2. Gradual Packet Aggregation Gradually aggregated Free space PA-Flow enabled VM Host PA-Flow Full MTU size PA-Flow enabled The average packet size is enlarged!
3. Next-hop Aware Aggregation SDN Controller Flow Table Identifier Next-Hop 3 100 5 200 Next-Hop (ID: 100) VM Host PA-Flow branch 3 Next-Hop (ID: 200) Add/Delete Entries 5 PA-Flow supports branches of service chain
Overview Architecture : disaggregation : aggregation 2. Gradual aggregation PA-Flow Firewall DPI Policer Firewall 3. Next-hop-aware Gateway DPI Firewall 1. Fast and Network-aware aggregation
PA-Flow enabled NFV-node VM VNF Packets sent by VNF packet queue Host PA-Flow header Flow Table Same Next-hop Disaggregation Header format version (8 bit) Packet length 1 (16 bit) type packet num packet length 2 … Aggregation PA-Flow VXLAN header MAC IP UDP PA-Flow header vSwitch vSwitch processing (VXLAN) can be used
There is no extra waiting time for aggregation Implementation (a) Default vhost-user (b) PA-Flow VNF VNF Enqueue at once packet queue packet queue polling … … header Copy at once packet structure packet structure There is no extra waiting time for aggregation
Contents Background Related Work Proposal Evaluation Conclusion 1 Background Problems of Service Chaining Related Work Scaling-out Scaling-up Proposal Evaluation Conclusion 2 3 4 5
Performance Evaluation Exp. 1: Baseline CPU i7-6900K (8 cores, 3.2 GHz) NIC 10 GbE (Intel X540) OS CentOS 7.3 Tester NFV-node DuT Exp. 2: Service Chain VXLAN used Tester … NFV-node Tester NFV-node More realistic NFV-system Tester 2-VNF chain
Exp.1: Baseline (a) Default (b) PA-Flow (c) PA-Flow with VNF support VM A VM B Traffic Generator (MoonGen) VNF (OVS-DPDK) vport 3 vport 4 vport 3 vport 4 Host A Host B OVS-DPDK OVS-DPDK port 1 port 2 port 1 port 2 DuT 10Gb Ethernet
Exp.1: Baseline (a) Default (b) PA-Flow (c) PA-Flow with VNF support VM A VM B Traffic Generator (MoonGen) VNF (OVS-DPDK) PA-Flow vport 3 vport 4 vport 3 vport 4 Host A Host B Disaggr. Aggr. Disaggr. Aggr. OVS-DPDK OVS-DPDK port 1 port 2 port 1 port 2 DuT 10Gb Ethernet
Exp.1: Baseline (a) Default (b) PA-Flow (c) PA-Flow with VNF support VM A VM B Traffic Generator (MoonGen) VNF (OVS-DPDK) PA-Flow vport 3 vport 4 vport 3 vport 4 This VNF handles PA-Flow packet PA-Flow header Host A Host B Disaggr. Aggr. OVS-DPDK OVS-DPDK port 1 port 2 port 1 port 2 DuT 10Gb Ethernet
The performance is improved to nearly 10 Gbps Exp.1: Throughput 175% 75% The performance is improved to nearly 10 Gbps
Exp.1: Latency / Jitter 200 ns increased 2 µs reduced Tx rate: 1 Mpps (1% aggregated) Tx rate: 3 Mpps (25% aggregated) 200 ns increased 2 µs reduced The pure overhead is negligible The latency reduced under high-rate communications
Exp.2: Service Chain 2-VNF chain Tester1 Tester3 VNF1 VNF2 Tester2 Double Tx rate Tester1 host1 Traffic Generator ( Tx only ) Port Tester3 host3 Port Traffic Generator ( Tx only ) VXLAN used host4 VNF1 Port OVS-DPDK VNF2 host5 Port OVS-DPDK Tester2 host2 Traffic Generator ( Tx and Rx) Port 2-VNF chain
The performance of service chain is drastically boosted Exp.2: Throughput 370% Overloaded at 3.2 Mpps 170% Ideal throughput! The performance of service chain is drastically boosted
Contents Background Related Work Proposal Evaluation Conclusion 1 Background Problems of Service Chaining Related Work Scaling-out Scaling-up Proposal Evaluation Conclusion 2 3 4 5
Conclusion Performance limitation of Service Chain Software-based scale-up approach is needed. Proposal: PA-Flow (Packet Aggregation Flow) Gradual aggregation reduces forwarding cost of upper VNFs. Flexible operation is realized with next-hop aware aggregation. Performance improved by 170% on a 2-VNF service chain. Future work Further performance improvement Porting PA-Flow to a VM-side component Evaluation of container