AppSwitch Application-layer Load Balancing within a Software Switch

Slides:



Advertisements
Similar presentations
SDN Controller Challenges
Advertisements

Programming Protocol-Independent Packet Processors
P4: specifying data planes
OpenFlow Costin Raiciu Using slides from Brandon Heller and Nick McKeown.
OpenFlow : Enabling Innovation in Campus Networks SIGCOMM 2008 Nick McKeown, Tom Anderson, et el. Stanford University California, USA Presented.
OpenFlow-Based Server Load Balancing GoneWild
SDN and Openflow.
Flowspace revisited OpenFlow Basics Flow Table Entries Switch Port MAC src MAC dst Eth type VLAN ID IP Src IP Dst IP Prot L4 sport L4 dport Rule Action.
1 Internet Networking Spring 2004 Tutorial 13 LSNAT - Load Sharing NAT (RFC 2391)
Chapter 9 Classification And Forwarding. Outline.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #12 LSNAT - Load Sharing NAT (RFC 2391)
Formal checkings in networks James Hongyi Zeng with Peyman Kazemian, George Varghese, Nick McKeown.
Information-Centric Networks10b-1 Week 13 / Paper 1 OpenFlow: enabling innovation in campus networks –Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Networks – Network Architecture Network architecture is specification of design principles (including data formats and procedures) for creating a network.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
NATs and UDP Victor Norman CS322 Spring NAPT Suppose we have a router doing NAT: half is the “public side”, IP address ; other half is.
Final Review!. So how’s it all work? I boot my machine I open my browser and type The page loads What all just happened?
Chapter 22 Q and A Victor Norman CS 332 Spring 2014.
OpenFlow:Enabling Innovation in Campus Network
Chapter 19 Binding Protocol Addresses (ARP) A frame transmitted across a physical network must contain the hardware address of the destination. Before.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
Security, NATs and Firewalls Ingate Systems. Basics of SIP Security.
Arbitrary Packet Matching in Openflow
Mapping IP Addresses to Hardware Addresses Chapter 5.
Information-Centric Networks Section # 13.2: Alternatives Instructor: George Xylomenos Department: Informatics.
NT1210 Introduction to Networking
15-1 Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
1 ** THE INTERNET ** Large, worldwide collection of networks that use a common protocol to communicate with each other A network of networks.
Chapter 3 Part 1 Switching and Bridging
P4: Programming Protocol-Independent Packet Processors
Shaopeng, Ho Architect of Chinac Group
COS 561: Advanced Computer Networks
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Introduction to Networks v6.0
CIS 700-5: The Design and Implementation of Cloud Networks
Instructor Materials Chapter 3 Communicating on a Local Network
P4 (Programming Protocol-independent Packet Processors)
Network Anti-Spoofing with SDN Data plane Authors:Yehuda Afek et al.
Section 4 – Computer Networks
Instructor Materials Chapter 5: Ethernet
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Scaling the Network: The Internet Protocol
PISCES: A Programmable, Protocol-Independent Software Switch
Introduction to Networks
Chapter 6: Network Layer
Ken Gunnells, Ph.D. - Networking Paul Crigler - Programming
Load Balancing Memcached Traffic Using SDN
Internet Networking recitation #12
Net 323: NETWORK Protocols
NT1210 Introduction to Networking
CS 457 – Lecture 10 Internetworking and IP
Data and Computer Communications by William Stallings Eighth Edition
Topic 5: Communication and the Internet
Packet Switching To improve the efficiency of transferring information over a shared communication line, messages are divided into fixed-sized, numbered.
The Stanford Clean Slate Program
CS 31006: Computer Networks – The Routers
Be Fast, Cheap and in Control
I. Basic Network Concepts
Software Defined Networking (SDN)
Chapter Goals Compare and contrast various technologies for home Internet connections Explain packet switching Describe the basic roles of various network.
Implementing an OpenFlow Switch on the NetFPGA platform
TCP/IP Protocol Suite: Review
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
Scaling the Network: The Internet Protocol
Programmable Switches
Networking Essentials For Firewall-1 Administrators
EE 122: Lecture 22 (Overlay Networks)
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Sean Choi, Seo Jin Park, Muhammad Shahbaz,
Presentation transcript:

AppSwitch Application-layer Load Balancing within a Software Switch Eyal Cidon, Sean Choi, Sachin Katti and Nick McKeown Today, I will be giving a talk on AppSwitch. A application layer load balancing within a software switch. Unfortunately, my co-worker Eyal could not make it out to Hong Kong, so I will be giving this talk on his behalf. This work is a small prototype we built in our lab in Stanford, while being advised by Professor Katti and McKeown. Being the first speaker of the programmable data plane section, I think this work is a good introduction for anyone looking into start building simple applications using programmable data planes. So let's begin.

Fixed-Function Switch Chip Fixed Set of Protocols Fixed-Function Switch Chip TCP IPv4 Ethernet UDP IPv6 BGP HTTP TLS First of all, let me give a brief and basic overview of what programmable data plane is. Most networks today, such as those found in datacenters, are built from fixed-function Ethernet switch chips. Commercial switches built using these chips support only a fixed number of protocols that are hard-wired into a chip when it is designed and manufactured. Therefore, we can not typically add or remove protocols after the chip has been built. To resolve such shortcomings of fixed-function switches, lot of work has been done to create programmable switches.

Ethernet TCP HTTP IPv4 BGP IPv6 TLS Custom Protocols CUSTOM_P Programmable Switch Ethernet TCP HTTP CUSTOM_P IPv4 BGP IPv6 TLS The basic idea of programmable switch is that if you can program a switch yourself and tell it how it should process packets, then you can easily add new protocols or removing unnecessary protocols. In recent years, we’ve seen efforts in hardware switches such as Intel’s FlexPipe or Barefoot’s Tofino.

PISCES[1] PVPP[2] Ethernet TCP HTTP IPv4 BGP IPv6 Custom Protocols Software Switch Ethernet TCP HTTP CUSTOM_P IPv4 BGP IPv6 Here are some widely available P4 programmable software switch. First of them is PISCES, which was presented at SIGCOMM last year. It allows P4 support to Open vSwitch and implements a subset of the functionalities defined in P4 language. Second of them is PVPP, which allows P4 support to Cisco's VPP. My colleague will give a talk on PVPP following this talk, so stay tuned. So what is P4? It looks something like... PISCES[1] PVPP[2] [1] PISCES. ACM SIGCOMM 2016.  [2] PVPP. ACM SIGCOMM APNet 2017. 

Main Programming Language: P4 Header Definition Parser Definition Table Definition You can define a header, a parser, a match action table and a control flow. Here you can see an example of a Ethernet Header, IP packet parser and a table to match on the IP destination header to set the next hop port. As you can see, the language is very flexible and you can define any header, parser and match action table within the constraint of the language.

Could We Use Programmable Switches to Accelerate Applications? Data Parallel Computation Signal Processing Graphics ML Networking ??? DSP GPU TPU Programmable Switches So, what can we do with this besides doing fancy routing? I think the question has been asked and answered in the GPU community already. In the GPU world, once they realized GPUs were great at parallel computation, they came up with CUda. Now, it is being used all over the world for things like neural network. Now, similarly we know that switches are great at match-action. So, we asked... what can we do with it's strength?

AppSwitch: Using Programmable Switches to Accelerate Key-Value Load Balancing So we decided to do a small experiment, using programmable switches to load balance key-value traffic. We assume that it is a great use case, mainly because switches are perfect at redirecting things based on their contents.

Key-Value Stores are Everywhere! Basic operations: Get <Key> Set <Key, Value> Delete <key> Most Commonly used for Memory Caches Some might ask what key value stores are, so let me give a brief overview. You can imagine it as a set of machines dedicated to function as distributed hash tables. It as three main functions, get, set and delete and these functions can be extended for various fancy operations. It can also store data on memory for fast access or on disk with lower complexity than relational databases. Some examples of key value stores are memcached, redis and amazon elastic cache and they are heavily used in literally all large software companies.

Proxy Server used as Load Balancer Key Hash Destination A***** 192.168.1.1 After meeting with various companies like Facebook, we learned that their multi-node key-value cache setup looks something like this. There are four entities in this picture. A client that sends the key value request, A cache that will serve the key value request A regular network switch and finally a proxy that will route client request to appropriate locations. The process of sending a request is as follows. [talk other stuff...] Once the request reaches the proxy, it will re route the packet to the correct cache. Notice that there are 4 hops in this picture and all traffic is going through the switch

AppSwitch Benefit: Network Hop Reduction AppSwitch's main goal is to simply reduce the network hop and thus increase performance. The instead of a proxy, we have a AppSwitch controller that will install the forwarding rule for a particular key value store. Then, the process of requesting for the packet is equivalent to any other path rewrites. Notice that there are now only two hops to the cache. Key Hash Destination A***** 192.168.1.1

Memcached Headers in P4 P4 Code header_type memcached_binary_t { fields { magic : 8; opcode : 8; keyLen : 16; extraLen : 8; dataType : 8; status : 16; totalBodyLen : 32; opaque : 32; CAS : 64; } So, how can the switch look into the packet? Memcached send a special type of packet that embeds following header information encapsulated within a TCP or UDP packet. We wrote a P4 program that exactly replicates the header format as follows.

Memcached Load Balancer Merged with an SDN Controller The AppSwitch controller still collects data from the servers Other operations such as fault detection remain unchanged State management for the load balancing becomes equivalent to SDN switch state management AppSwitch’s Match-Action Chain Given that the switch can understand the header, the control flow of the packet is as follows. Note that AppSwitch controller still needs to collect new writes from the servers, in order to install the forwarding rules in the switch. Given that this is a key value lookup, any missing keys can be treated as a cache miss without impacting the application. Finally, managing states across all of the switches and controllers are equivalent to SDN switch state management.

AppSwitch Experimental Platform Load Generator: Mutilate Switch: PISCES (P4-Enabled OVS) Controller: Ryu-based custom controller Application: Memcached Proxy Server: McRouter To show the benefits of AppSwitch, we built the following experimental platform. ... McRouter is a proxy server that is currently in production use at Facebook.

We Run Two Experiments Experiment 1 Experiment 2 So we run two following experiments. First one is a traditional setup with a simple switch running McRouter as the proxy. Second is the AppSwitch enabled setup with P4 enabled OvS as the data plane.

Results show Promising Performance Gains Experiment  Avg. QPS  Avg. latency (ms)  95th latency (ms)  99th latency (ms) Baseline (McRouter)  631.0  1696.7  2025.4  4122.5 AppSwitch  1156.8  967.8  1368.2  1547.7 Experiments done with commonly used key-value sizes The results motivate the potential benefits of AppSwitch The numbers are quite promising. We can see that average QPS almost doubles, while the average latency halves across the board. We assume that if the proxy server lies far away from the servers, the gains will be more promising. The experiment is done with commonly used kv size, which is 80 bytes.

Generalizing AppSwitch other Applications header_type key_location { fields { preamble: 8; key_start_ptr : 32; key_end_ptr : 32; } header_type general_key { byte_1: 8; byte_2: 8; … byte_N: 8; Now, we can see that this approach can be expanded to any applications. Notice that we are simply looking at a portion of the packet to use as a match value and reroute the packet based on the data. I believe this opens a big space for anyone to look into, and I encourage everyone to find unique applications to start accelerating!

Summary AppSwitch accelerates key-value load balancing by utilizing programmable switches AppSwitch demonstrates the performance gains achievable by utilizing programmable switches for application layer support AppSwitch’s approach can be generalized to other application domain spaces So, in summary...

Questions?