Presentation is loading. Please wait.

Presentation is loading. Please wait.

Muhammad Shahbaz Nick Feamster Jennifer Rexford Sean Choi Nick McKeown

Similar presentations


Presentation on theme: "Muhammad Shahbaz Nick Feamster Jennifer Rexford Sean Choi Nick McKeown"— Presentation transcript:

1 PISCES: A Programmable, Protocol-Independent Software Switch [SIGCOMM’16]
Muhammad Shahbaz Nick Feamster Jennifer Rexford Sean Choi Nick McKeown Ben Pfaff Changhoon Kim Hi, my name is Muhammad Shahbaz. Today, I am going to talk about our work on PISCES: A Programmable, Protocol-Independent Software Switch. It’s a joint work with folks at Princeton, Stanford, VMware, and Barefoot.

2 Importance of Software Switches
VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM OVS OVS OVS OVS Hypervisor Hypervisor Hypervisor Hypervisor ToR ToR ToR ToR Core When we think of Ethernet switches and routers, we tend to think of physical boxes residing either inside the core of a network or at the top of a rack of servers. [click] However, in today’s virtualized data centers like Microsoft Azure, Amazon EC2 and Rackspace, each hypervisor [click] contains a software switch (also called virtual switch), e.g., Open vSwitch (OVS), for routing traffic to and from virtual machines and the outside world.

3 Importance of Software Switches
VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM OVS OVS OVS OVS Hypervisor Hypervisor Hypervisor Hypervisor ToR ToR ToR ToR Core Because these switches are written in software they can be upgraded more easily than hardware switches.

4 Importance of Software Switches
Adding new protocol headers STT, Geneve, NVGRE, NSH, and more Removing existing protocol headers Adding better visibility In-band Network Telemetry (INT) Adding entirely new features CONGA, HULA, RCP, VL2, AIP and more OVS For example, [click] it can be modified to add support for new protocol headers like STT, Geneve, and more; [click] or remove existing protocol headers, [click] or add better visibility like INT; [click] or add new features like COGNA, RCP and more.

5 Rapid Development and Deployment?
Enable Rapid Development and Deployment of Network Features! OVS ? That’s why it’s a common assumption that software switches enable rapid development and deployment of network features. [click] But is this really the case? [click] No, it isn’t and let me tell you why.

6 Internals of a Software Switch
OVS Kernel Fast Packet IO (or Forwarding) DPDK A software switch is based on a [click] large body of complex codebase like [click] Kernel, [click] DPDK, and more that is needed to set up the machinery for fast packet IO (or forwarding).

7 Internals of a Software Switch
Requires domain expertise in: Network protocol design Software development Build Test Deploy large codebases Very slow release cycle (adding a new feature can take years) Maintaining changes across releases OVS Parser Packet Processing Logic Match-Action Pipeline Arcane APIs Kernel DPDK And at the same time, one has to specify the logic for packet processing, [click] like parser and match-action pipeline, using [click] arcane methods and interfaces exposed by these complex codebases. [click] Thus, making changes in these switches is a formidable undertaking and requires [click] expertize in­ network protocol design and software development with the ability to develop,­ test, and deploy large, complex code­bases. [click] It can take up to 3-6 months to push a new feature into the mainstream code. (A usual kernel release cycle is 2 months and then it can take years to push the changes into Linux distributions like Ubuntu and Fedora.) Furthermore, customizations require not only incorporating changes into switch code, but also maintaining these customizations across different versions of the switch.

8 Needs of a Protocol Designer
OVS Parser Match-Action Pipeline Kernel DPDK However, there is a conflict here. As protocol designers, we are interested in specifying how to parse packet headers and the structure of the match-action tables (i.e., which header fields to match and which actions to perform on matching headers). [click] We do not need to understand the complexities of the underlying codebase and the arcane APIs that they expose to enable for fast packet IO.

9 Needs of a Protocol Designer
OVS Parser Match-Action Pipeline Kernel DPDK So what should we do about this? How should we address this conflict?

10 Road to Protocol Independence
OVS Parser Match-Action Pipeline Kernel DPDK

11 Road to Protocol Independence
? Domain-Specific Language Parser Match-Action Pipeline Compile OVS Parser Match-Action Pipeline Kernel DPDK How about we separate the packet processing logic from the switch and [click] specify it using a high-level domain-specific language (DSL). [click] And then compile it down to underlying switch, letting the compilation process takes care of the arcane APIs provided by these various large and complex codebases. [click] A natural question now arises is that what domain-specific language should we use for specifying the packet processing logic.

12 Road to Protocol Independence
P4 is an open-source language.[1] Describes different aspects of a packet processor: Packet headers and fields Metadata Parser Actions Match-Action Tables (MATs) Control Flow Parser Match-Action Pipeline Compile OVS Parser Match-Action Pipeline Kernel DPDK For this work, we choose P4: a high-level language for programming protocol-independent packet processors. [click] It’s an open-source language [click] that let’s a programmer describe different aspects of a packet processor e.g., packet headers and fields, parser, actions, match-action tables, and control flow. [1]

13 Road to Protocol Independence
341 lines of code Parser Match-Action Pipeline Native OVS Compile OVS Parser Match-Action Pipeline 14,535 lines of code Kernel DPDK Now, using P4, [click] a protocol designer can specify the packet processing logic of the native OVS using roughly [click] 341 lines of code, whereas, [click] it can take roughly forty times more lines of code, or around fourteen thousand lines of code, when writing the same packet processing logic using the APIs exposed by the underlying codebases. So, for a protocol designer, separating the packet processing logic from the switch clearly has its benefits. We will quantify such benefits in more detail in the later part of this presentation. [1]

14 Road to Protocol Independence
Parser Match-Action Pipeline Compile OVS Parser Match-Action Pipeline Kernel DPDK [1]

15 Road to Protocol Independence
Parser Match-Action Pipeline Compile Performance Overhead! OVS Parser Match-Action Pipeline Kernel DPDK Also, as we are compiling a P4 program to low-level APIs provided by the Kernel or DPDK in OVS, therefore, it can incur some overhead on performance. We measure this overhead by comparing the performance of the native OVS – a hand-written software switch – to a modified version of OVS that is programmed via P4. [1]

16 Primary Goals Quantifying the Reduction in Complexity of expressing custom protocols in P4 vs. direct modifications to OVS source code. Performance Optimizations to reduce the overhead of compiling a P4 program to OVS. In this work, [click] our first goal is to quantify the reduction in complexity i.e., the ease with which a programmer can express new protocols in P4 vs. directly modifying the OVS source code. [click] Our second goal is to implement performance optimizations to reduce the additional overhead on performance when compiling a P4 program to OVS.

17 PISCES: A Protocol-Independent Switch
vSwitch OVS To study these goals, we implement PISCES, a software switch that allows custom protocol specification using P4, without requiring direct modifications to OVS source code.

18 PISCES: A Protocol-Independent Switch
Compiler parse match action OVS A P4-to-OVS compiler, in PISCES, compiles the P4 code to OVS. [click] It generates the parse, match, and action code needed [click] to build the corresponding switch executable. Executable

19 Primary Goals Quantifying the Reduction in Complexity of expressing custom protocols in P4 vs. direct modifications to OVS source code. Performance Optimizations to reduce the overhead of compiling a P4 program to OVS. Let’s us now look at our first goal i.e., to quantify the complexity of developing and maintaining custom protocols in P4 vs. directly modifying OVS source code.

20 Goal: Quantifying the Reduction in Complexity
We evaluate two categories of Complexity: Development Complexity of developing baseline packet processing logic for a software switch. Change Complexity of making changes and maintaining an existing software switch. [click] We evaluate two categories of complexity. The first one is the development complexity i.e., the amount of effort needed in building baseline packet processing logic for a software switch (or in other words building a switch from scratch). [click] The second one is the change complexity i.e., the effort needed in making and maintaining a change in an existing switch.

21 Goal: Quantifying the Reduction in Complexity
OVS Parser Match-Action Pipeline Kernel DPDK As I have mentioned earlier, [click] building a switch requires specifying the packet processing logic along with the underlying machinery.

22 Goal: Quantifying the Reduction in Complexity
L2fwd[1] application specifies: Header files and libraries Initialize Env. and memory Cores Queues Ports Launch packet processing loop Packet processing/fwding logic DPDK For example, in DPDK, this mean specifying the [click] header files and libraries containing DPDK-specific methods and interfaces. … [click] code for initializing the DPDK environment and allocating memory … [click] code for initializing CPU cores ... [click] code for initializing queues and binding them with ports and cores ... [click] code for intializing ports. [click] After initialization, the next step is to write code for lanuning the packet processing loop. [click] Finally, after all this is done the programmer specifies the packet processing logic. [1]

23 Goal: Quantifying the Reduction in Complexity
L2fwd[1] application specifies: Header files and libraries Initialize Env. and memory Cores Queues Ports Launch packet processing loop Packet processing/fwding logic DPDK However, as a protocl designer, we are only interested in specifying this. [1]

24 Goal: Quantifying the Reduction in Complexity
OVS Parser Match-Action Pipeline Kernel DPDK Also …

25 Goal: Quantifying the Reduction in Complexity
Also need to specify: Command-Line Utilities Adding/Removing Flows Querying Flows and Stats Control-Plane Interfaces e.g., OpenFlow OVS Slow Path Match-Action Pipeline Parser Match-Action Cache Fast Path Kernel DPDK in OVS, the match-action pipeline is specified in the user-space (i.e., slow path) and kernel-space (i.e., fast path) implements a cache for fast packet processing. A programmerwe will have to write code for both the match-action pipeline, as well as, the cache in OVS. Along with that, we will have to write code for different command-line utilities for adding/removing flows, and querying flows and stats. We will also have to write code for the interface between the controller and switch e.g., OpenFlow.

26 Goal: Quantifying the Reduction in Complexity
L2-Switch[1] application specifies: Packet and metadata headers Parser Match-Action Tables Control Flow P4 PISCES OVS Instead of this bottom-up approach in P4, we follow a top-down approach, we only write code needed to specify how a packet should be processed e.g., [click] packet and metadata headers, [click] parser, [click] match-action tables and [click] control flow. [click] Finally, PISCES compiles the P4 program to OVS. [1]

27 Goal: Quantifying the Reduction in Complexity
Development Complexity Switch Type LoC Methods Method Size OVS 14,535 106 137.13 PISCES 341 40 8.53 First, we measure the development complexity. We compare the native OVS with the equivalent baseline functionality implemented in PISCES. We use three metrics: lines of code, method count, and average method size. Note that these measurements only consider the set of code that is responsible for match, parse and action.

28 Goal: Quantifying the Reduction in Complexity
Development Complexity Switch Type LoC Methods Method Size OVS 14,535 106 137.13 PISCES 341 40 8.53 40x We see that PISCES reduces the lines of code by about factor of 40 and the average method size by about a factor of 20.

29 Goal: Quantifying the Reduction in Complexity
Development Complexity Switch Type LoC Methods Method Size OVS 14,535 106 137.13 PISCES 341 40 8.53 40x 20x and the average method size by about a factor of 20.

30 Goal: Quantifying the Reduction in Complexity
Change Complexity Change Files Changed Lines Changed Connection Label: OVS 28 411 PISCES 1 5 Tunnel OAM Flag: 18 170 6 TCP Flags: 20 370 4 Second, we evaluate the change complexity. We compare the effort required in adding support for a new header field in a protocol that is otherwise already supported in OVS and in PISCES. We add support for three fields (…) and measure how many lines and files needs to be changed in adding those fields.

31 Goal: Quantifying the Reduction in Complexity
Change Complexity Change Files Changed Lines Changed Connection Label: OVS 28 411 PISCES 1 5 Tunnel OAM Flag: 18 170 6 TCP Flags: 20 370 4 We see that modifying just a few lines of code in a single P4 file in PISCES is sufficient to support a new field, whereas in OVS, the corresponding change often requires hundreds of lines of changes over tens of files.

32 Goal: Quantifying the Reduction in Complexity
So, does PISCES reduce complexity? The resulting code simplicity makes it easier to: Implement Deploy Maintain custom software switches Yes! So, does PISCES reduce complexity. [click] Yes, it does. [click] The resulting code simplicity should make it easier for protocol designers to implement, deploy, and maintain custom software switches.

33 Primary Goals Quantifying the Reduction in Complexity of expressing custom protocols in P4 vs. direct modifications to OVS source code. Performance Optimizations to reduce the overhead of compiling a P4 program to OVS. Our second goal is performance optimizations.

34 Goal: Performance Optimizations
P4 and OVS packet forwarding model Performance overhead of a naïve mapping from P4 to OVS PISCES compiler optimizations to reduce the performance overhead I will first talk about [click] the forwarding model that both P4 and OVS support, [click] then I will show how a naïve mapping from P4 to OVS can lead to performance overhead. [click] And, finally I will discuss different optimizations that we have implemented to reduce this performance overhead.

35 (Post-Pipeline Editing)
P4 Forwarding Model (Post-Pipeline Editing) Ingress Packet Parser Checksum Verify Checksum Update Packet Deparser Egress Match-Action Tables Ingress Packet Egress Packet Header Fields In P4 packet forwarding model, [click] a packet parser [click] identifies the headers and [click] extracts them as packet header fields (i.e., essentially a copy of the content of the packets). [click] The checksum verify block then [click] verifies the checksum based on the header fields specified in the P4 program. [click] The match-action tables [click] operate on these header fields. [click] A checksum update block [click] updates the checksum. [click] Finally, a packet deparser [click] writes the changes from these header fields back on to the packet. [click] We name this mode of operating on header fields as “post-pipeline editing.”

36 OVS Forwarding Model Egress Packet Match-Action Egress Tables
Ingress Packet Flow Rule Miss Match-Action Cache Ingress Packet Parser Whereas, in OVS packet forwarding model, [click] a packet parser only [click] identifies the headers. The [click] packet is then looked up in a match-action cache. If there is a [click] miss, the is sent to the match-action tables (that form the actual switch pipeline). [click] A new flow rule is calculated and installed in the match-action cache. [click] And the original packet, as processed by the match-action pipeline is sent to the egress.

37 OVS Forwarding Model Match-Action Tables Egress Ingress Packet
Egress Packet Hit Match-Action Cache Ingress Packet Parser Egress Next time, when another packet belonging to the same flow, enters the switch. [click] The parser identifies the headers as before. [click] This time the cache will result in a hit, and [click] the packet is processed and sent to the egress without traversing the match-action pipeline.

38 OVS Forwarding Model (Inline Editing) Match-Action Tables Egress
Egress Packet Match-Action Cache Ingress Packet Parser Egress In OVS, tables directly operate on the headers inside the packet (i.e., not copy is maintained). We name this mode of operating on packet header fields as “inline editing.”

39 PISCES Forwarding Model (Modified OVS)
Supports both editing modes: Inline Editing Post-pipeline Editing Match-Action Tables Match-Action Cache Egress Ingress Packet Parser Checksum Verify Checksum Update Packet Deparser So, in order to map P4 to OVS, we modified the OVS to provide support for both these editing modes. We call this modified OVS model, a PISCES forwarding model.

40 PISCES: Compiling P4 to OVS
Packet Parser Match-Action Tables Deparser Ingress Egress Checksum Update Verify P4 Packet Parser Ingress Match-Action Cache Tables Deparser Egress Checksum Verify Checksum Update modified OVS So, now the problem is how to efficiently compile the P4 forwarding model [click] to this modified OVS forwarding model.

41 Naïve Compilation from P4 to OVS
A naïve compilation of L2L3-ACL benchmark application Performance overhead of ~ 40% We see that a naïve compilation of our benchmark application (which is essentially a router with an access-control list) shows [click] that PISCES has a performance overhead of about 40% compared to the native OVS.

42 Causes of Performance Overhead
Packet Parser Ingress Match-Action Cache Tables Deparser Egress Checksum Verify Checksum Update Cache Misses CPU Cycles per Packet We observe that there are two main aspects that significantly affect the performance of PISCES. [click] The first one is the number of CPU cycles consumed in processing a single packet. And the second one is the number of cache misses.

43 Cause: CPU Cycles per Packet
A naïve compilation of L2L3-ACL benchmark application Switch Optimization Parser (Cycles) Match-Action Cache Throughput (Mbps) Match Actions PISCES Without optimizations 76.5 209.5 379.5 7590.7 OVS Native 43.6 197.5 132.5 In terms of the number of CPU cycles, for L2L3-ACL …

44 Cause: CPU Cycles per Packet
A naïve compilation of L2L3-ACL benchmark application Switch Optimization Parser (Cycles) Match-Action Cache Throughput (Mbps) Match Actions PISCES Without optimizations 76.5 209.5 379.5 7590.7 OVS Native 43.6 197.5 132.5 2x PISCES consumes two times more cyles in parsing and actions, when compiled without optimizations, and the throughput is two times less than the native OVS.

45 Factors affecting CPU Cycles per Packet
Extra copy of headers Fully-specified Checksum Parsing unused header fields and more … We studied different factors that affected the CPU cycles per-packet [click] like …. I will now talk about each of them in detail and how we optimized them in PISCES.

46 Factor: Extra Copy of Headers
Editing Mode Pros Cons Post-Pipeline Extra copy of headers Inline No extra copy of headers Post-pipeline editing consumes 2x more cycles than inline editing when parsing VXLAN protocol. P4, by default, assumes a post-pipeline editing, thus, it requires keeping an extra copy of headers as shown earlier. [click] The post-pipeline editing mode requires maintaining an extra copy of headers. Whereas, inline editing mode doesn’t require any extra copy. We found that maintaining these extra copies has a significant overhead. For example, while using a tunneling protocol (like VXLAN), the post-pipeline editing mode consumed two times more cycles than inline editing mode.

47 Factor: Extra Copy of Headers
Editing Mode Pros Cons Post-Pipeline Packet is adjusted once Extra copy of headers Inline No extra copy of headers Multiple adjustments to packet Post-pipeline editing Inline editing But, on the other hand, in case of post-pipeline editing, packet is adjusted only once, i.e., changes are applied only once at the end of the match-action pipeline. Whereas, in case of inline editing, packet is adjusted (and changes are applied) on every action. [click] Adjusting packets has a very high toll on performance, as can be seen in the graph here. Based on our micro-benchmarks, we observe that if the number of adjustments are less than six, we should use inline editing otherwise [click] we should use post-pipeline editing mode.

48 Optimization: Inline vs. Post-Pipeline Editing
If adjustments are less than the threshold (i.e., six) use inline editing, otherwise use post-pipeline editing. Thus, we came up with the following optimization, we call “inline vs. post-pipeline editing”. [click] At compile time, we analyze the P4 program to find out if the number of adjustments (e.g., add- and remove-headers actions) are less than the specified threshold. If they are, we use inline editing, otherwise, we use post-pipeline editing.

49 Optimization: Inline vs. Post-Pipeline Editing
Our L2L3-ACL application adjusts packets twice, in the worst case, i.e., VLAN encapsulation and decapsulation Thus, compiler selects inline editing mode Match-Action Pipeline Egress Ingress Packet Parser Checksum Verify Checksum Update Packet Deparser No extra copy of headers No packet deparsing In our L2L3-ACL application, the number of adjustments are less than the threshold, i.e., it only does VLAN encapsulation and decapsulation. Thus, the compiler selects inline editing. [click] Based on that, the compiler generates code where parser doesn’t make a copy of headers and, thus, [click] no packet deparsing stage is needed.

50 Optimization: Inline vs. Post-Pipeline Editing
After optimization, the L2L3-ACL switch pipeline looks like this: Match-Action Pipeline Egress Ingress Packet Parser Checksum Verify Checksum Update After the optimization, this is how our L2L3-ACL switch pipeline looks like.

51 Optimization: Inline vs. Post-Pipeline Editing
Cycles per packet and throughput of L2L3-ACL application Switch Optimization Parser (Cycles) Match-Action Cache Throughput (Mbps) Match Actions PISCES Without optimizations Inline Editing 76.5 -42.6 209.5 379.5 +7.5 7590.7 +281.0

52 Factor: Fully-Specified Checksum
Checksum Verify ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Checksum Update ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Match-Action Pipeline Egress Ingress Packet Parser Checksum Verify Checksum Update

53 Optimization: Incremental Checksum
Provided annotations to let the programmer inform the compiler whether the switch: Is a part of an “end-host stack” Is running as a “transit switch” Or neither of the two modes

54 Optimization: Incremental Checksum
If part of an “end-host stack” Checksum Verify ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Checksum Update ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Match-Action Pipeline Egress Ingress Packet Parser Checksum Verify Checksum Update

55 Optimization: Incremental Checksum
If part of an “end-host stack” Checksum Verify ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Match-Action Pipeline Egress Ingress Packet Parser Checksum Verify

56 Optimization: Incremental Checksum
If running as a “transit switch” Checksum Verify ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Checksum Update ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Match-Action Pipeline Egress Ingress Packet Parser Checksum Verify Checksum Update

57 Optimization: Incremental Checksum
If running as a “transit switch” Checksum Verify ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Incremental Checksum Update ( …) Match-Action Pipeline Egress Ingress Packet Parser Checksum Verify Checksum Update

58 Optimization: Incremental Checksum
If running as a “transit switch” Checksum Verify ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Incremental Checksum Update ( …) dec(ttl) Egress Ingress Packet Parser Checksum Verify Checksum Update

59 Optimization: Incremental Checksum
If running as a “transit switch” Checksum Verify ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Incremental Checksum Update ( ttl) dec(ttl) Egress Ingress Packet Parser Checksum Verify Checksum Update

60 Optimization: Incremental Checksum
If running as a “transit switch” Incremental Checksum Update ( ttl) dec(ttl) Egress Ingress Packet Parser Checksum Update

61 Optimization: Incremental Checksum
Or neither of the two modes (i.e., “end-host stack” or “transit switch”) Incremental Checksum Update ( ttl) dec(ttl) Egress Ingress Packet Parser Checksum Update

62 Optimization: Incremental Checksum
Or neither of the two modes (i.e., “end-host stack” or “transit switch”) Checksum Verify ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) Incremental Checksum Update ( ttl) dec(ttl) Egress Ingress Packet Parser Checksum Verify Checksum Update

63 Optimization: Incremental Checksum
Cycles per packet and throughput of L2L3-ACL application Switch Optimization Parser (Cycles) Match-Action Cache Throughput (Mbps) Match Actions PISCES Without optimizations Inline Editing Inc. Checksum (Transit Mode) 76.5 -42.6 209.5 379.5 +7.5 -231.3 7590.7 +281.0

64 Factor: Parsing Unused Header Fields
L3 L2 L2 L4 Match-Action Pipeline Egress Ingress Packet Parser Checksum Update

65 Optimization: Parser Specialization
Cycles per packet and throughput of L2L3-ACL application Switch Optimization Parser (Cycles) Match-Action Cache Throughput (Mbps) Match Actions PISCES Without optimizations Inline Editing Inc. Checksum (Transit Mode) Parser Specialization 76.5 -42.6 -4.6 209.5 379.5 +7.5 -231.3 7590.7 +281.0 +282.3

66 Optimizations for Improving Actions
Cycles per packet and throughput of L2L3-ACL application Switch Optimization Parser (Cycles) Match-Action Cache Throughput (Mbps) Match Actions PISCES Without optimizations Inline Editing Inc. Checksum (Transit Mode) Parser Specialization Action Specialization Action Coalescing 76.5 -42.6 -4.6 209.5 379.5 +7.5 -231.3 -10.3 -14.6 7590.7 +281.0 +282.3 +191.2 +293.0

67 All Optimizations together for L2L3-ACL
Switch Optimization Parser (Cycles) Match-Action Cache Throughput (Mbps) Match Actions PISCES Without optimizations Inline Editing Inc. Checksum (Transit Mode) Parser Specialization Action Specialization Action Coalescing 76.5 -42.6 -4.6 209.5 379.5 +7.5 -231.3 -10.3 -14.6 7590.7 +281.0 +282.3 +191.2 +293.0 All optimizations 29.7 209.0 147.6 OVS Native 43.6 197.5 132.5

68 Optimized Compilation from P4 to OVS
An optimized compilation of L2L3-ACL benchmark application Performance overhead of < 2%

69 Cause: Cache Misses Cache Misses 3500+ Cycles (50x Cache hit)
Packet Parser Ingress Match-Action Cache Tables Deparser Egress Checksum Verify Checksum Update 3500+ Cycles (50x Cache hit) Throughput < 1 Mpps Cache Misses

70 Factors affecting Cache Misses
Entropy of packet header fields Stateful operations in the match-action cache (or fast path).

71 Factor: Entropy of Packet Header Fields
We loosely define “high entropy” header fields as those which are likely to have differing values from packet to packet flowing through a switch[1]. Similar Layer-2 MAC addresses. Layer-4 Ports vary connection to connection, e.g., HTTP (80), HTTPS(443), SSH (22) etc. Layer-4 fields have higher entropy than layer-2 fields [click] We loosely define high entropy packet fields as those which are likely to have differing values from packet to packet flowing through a switch. [click] For example, all traffic originating from a particular host [click] will likely have the same source and destination MAC fields, [click] but the source and destination L4 port fields are likely to change from connection to connection. [click] Thus, we say the L4 port fields have higher entropy than the L2 address fields. [1] N. Shelly et.al. Flow caching for high entropy packet fields. In HotSDN '14.

72 Factor: Entropy of Packet Header Fields
Match-Action Cache is highly sensitive to the entropy of header fields. Cache rules matching on high-entropy fields would result in more misses, thus, leading to poor performance. Match-Action Cache Egress Ingress

73 Factor: Entropy of Packet Header Fields
Objective: “generate cache rules that match on low-entropy header fields whenever possible.” Match-Action Cache Egress Ingress

74 Factor: Entropy of Packet Header Fields
OVS Match-Action Tables Match-Action Cache Egress Ingress

75 Factor: Entropy of Packet Header Fields
OVS Match-Action Tables Match: L2, L3, L4, Metadata Actions Match-Action Cache Egress Ingress

76 Factor: Entropy of Packet Header Fields
OVS Match-Action Tables “Staged Lookups[1]” Actions Match: Metadata Match: Metadata, L2 Match: Metadata, L2,L3 Match: Metadata, L2,L3,L4 Entropy Match-Action Cache Egress Ingress [1] B. Pfaff et.al. The design and implementation of open vSwitch. In NSDI'15.

77 Factor: Entropy of Packet Header Fields
OVS Match-Action Tables “Staged Lookups[1]” Actions Match: Metadata Match: Metadata, L2 Match: Metadata, L2,L3 Match: Metadata, L2,L3,L4 Miss Match: Metadata,L2,L3 Match-Action Cache Egress Ingress [1] B. Pfaff et.al. The design and implementation of open vSwitch. In NSDI'15.

78 Factor: Entropy of Packet Header Fields
PISCES Match-Action Tables Match: H1, H2, H3, H4, … Actions No information about the “entropy” of headers (H1, H2, ...) Match-Action Cache Egress Ingress

79 Factor: Entropy of Packet Header Fields
PISCES Match-Action Tables Match: H1, H2, H3, H4, … Actions Match: H?, H?, H?, … Match-Action Cache Egress Ingress

80 Optimization: Stage Assignment
PISCES P4 Program Header H1 {...} Header H2 {...} Header H3 {...} Header H4 {...}

81 Optimization: Stage Assignment
PISCES P4 Program Entropy Header H1 {...} Header H2 {...} Header H3 {...} Header H4 {...}

82 Optimization: Stage Assignment
PISCES Entropy Header H1 {...} Header H3 {...} Header H2 {...} Header H4 {...}

83 Optimization: Stage Assignment
PISCES Match-Action Tables Actions Match: H1 Match: H1,H3 Match: H1,H3,H2 Match: H1,H3,H2, H4 Match-Action Cache Egress Ingress

84 Optimization: Stage Assignment
PISCES Match-Action Tables Actions Match: H1 Match: H1,H3 Match: H1,H3,H2 Match: H1,H3,H2, H4 Miss Match: H1,H3,H2 Match-Action Cache Egress Ingress

85 Goal: Performance Optimizations
With appropriate compiler optimizations, PISCES performs comparable to native software switches (e.g., OVS).

86

87 PISCES Summary vSwitch P4 OVS
Programs written in PISCES, using P4, are 40 times more concise than native software code. With hardly any performance overhead! OVS

88 Learn more and try PISCES, here:
Questions? Learn more and try PISCES, here: Muhammad Shahbaz


Download ppt "Muhammad Shahbaz Nick Feamster Jennifer Rexford Sean Choi Nick McKeown"

Similar presentations


Ads by Google