Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Fidelity Switch Models for SDN Emulation

Similar presentations


Presentation on theme: "High-Fidelity Switch Models for SDN Emulation"— Presentation transcript:

1 High-Fidelity Switch Models for SDN Emulation
MAKE THIS WIDE AND LONG High-Fidelity Switch Models for SDN Emulation Danny Y. Huang Kenneth Yocum Alex C. Snoeren University of California, San Diego

2 Buying OpenFlow Switches
Suppose you manage the network of a small company. On the network runs an application that uses a key-value store like Redis. Your boss wants you to build SDN cluster for app. # Three companies approach you with free sample switches. # Which one to buy? Naturally, you want to pick one that would work best for your application. Clients HP Procurve Fulcrum Monaco Quanta LB4G Which one to buy?

3 Buying OpenFlow Switches
You run your app on one of the sample switches. Details of workload later, but this graphs shows how fast your redis clients complete their queries on HP. # 30% queries never completed. Worried. Try another switch. 30% queries timed out Clients

4 Buying OpenFlow Switches
Swap in Monaco switch. Faster clients, fewer losses. Monaco doing pretty well. Maybe the next switch will do better? Clients

5 Buying OpenFlow Switches
Swap in Quanta switch, drastically different performance. # Confused. Same topo, same app, same workload. Why different performance? Clients Same topology. Same workload. Different performance!

6 Buying OpenFlow Switches
You show the performance graphs to your boss. Like Monaco. # Scale Monaco. # In one-switch case, Monaco is doing better for our workload. Can we still make the same conclusion for 1000 Monaco switches? What about a combination of vendors? Which vendor or vendors should you choose to maximize performance? Performance Variations Clients x 1,000 = How to predict the performance? Which one to buy?

7 Emulating OpenFlow Networks
You think of using emulation to predict the performance. # There are great tools out there that do a great job simulating and emulating the data plane, such link delays, queue lengths and topologies in software. # But this is OpenFlow. Control plane is involved. Ctl plane performance coupled with data plane. Existing options are not sufficient to capture the complexities of the control plane. # You can use popular options like Mininet, which leverages Open vSwitch to emulate the control plane. # But OVS itself is just another OpenFlow switch. It also introduces performance variations like its hardware counterparts. How OVS performs doesn’t tell you much about the hardware. OVS itself would also introduce performance variations. Data Plane Simulators / Emulators Data Traffic Control Plane Open vSwitch Mininet Controller OpenFlow

8 Problem Goal Hard to predict OpenFlow network’s performance:
OpenFlow Switches are different. Existing emulation framework is not good enough. Goal To predict performance with realism: To design an emulator that captures vendor variations. To measure these variations in the control plane. Problem: Hard to predict performance of OpenFlow network. Vendors introduce variations. Existing emulation techniques are not good enough. Either they emulate only the data plane, like Emulab or ModelNet, or it does not account for the control plane variations across vendors. # Our goal is to predict performance of OpenFlow network with realism. Given vendor variations, we need to design emulator that capture these differences. But then, we need to first know: what are some of the vendor variations?

9 Variations across Vendors
Examples include flow table size. A switch with a smaller flow table can hold fewer activate flows. More flows have to go thru the controller. Slower performance. # Also, flow mgmt policies can vary across switches. Some switch may put new rules into the software flow table if the new rules are arriving at the hardware table too fast. Some switch don’t even have software tables. May simply drop new rules if they arrive too fast. # CPUs may be different, too. They handle all communications between switch and controller. If a switch is slow, this communication is slow. # This is a long list. Complex interactions. # For this talk, we focus on the switch CPU’s effect on the control path. Other differences are highlighted in our paper. Differences in Control Plane Flow table size Flow management policies Switch CPU etc Controller OpenFlow Protocol Focus on CPU’s effect on control-path delays. Data Traffic

10 Disproportionately affects short flows.
Control-Path Delays How do switch CPUs affect the control path latency? Recall how OpenFlow switches work. Three components: workload, the switch and the controller. Suppose controller runs the L2-learning module. Installs rules for every new flow. # New flow arrives. Not matched in the flow table. Switch generates pkt-in. Controller installs a rule by issuing a flow-mod. Pkt exits. # Subsequent pkts are matched in the flow table. Does not involve the control plane. Data plane pkts are matched in the TCAM, super fast. # Ctrl plane traffic is slower. Handled by wimpy CPU. # Delay in ingress and egress. # For short flows, this extra overhead can be significant. To help us predict performance across vendors, we need to characterize the ingress and egress delays for a given workload. We design experiement to measure just these two delays. Controller (POX) Disproportionately affects short flows. Packet-in Events Flow-mod Events Hardware OpenFlow Switch Ingress Delay Egress Delay CPU TCAM Ingress Egress Data plane traffic Application Workload

11 Measure Control-Path Delays
Test harness to help us characterize the control path delays with respect to real application workload. Measurement server connected to switch. # Controller runs default L2-learning switch; installs rules for every new flow. # Client creates short flows every 50 ms. Each short flow is a redis query for a 64-byte value from server. Measure time taken for client query to complete, as well as ingress & egress delays at switch. Note: Run controller, client and server on the same machine to faciliate time measurement. We make sure that none of the applications are CPU bound. # With this set-up, we hope to generate the workload and observe how different switches inject different control path delays. To do this, we simply swap in and out the appropriate switches. Installs rules for every new flow Server Control Plane HP Procurve Fulcrum Monaco Quanta LB4G Open vSwitch (OVS) Controller Eth 0 Eth 1 Eth 2 OF Switch Data Plane Clients Queries for a 64-byte value every 50 ms. We measure the query time, ingress & egress delays.

12 Measure Redis Query Times
Same as graph shown earlier. Different performance. We didn’t know why. Thanks to harness, we can break down measurement into ingress and egress delays to see why we see such differences. Query completion times for Redis clients (ms)

13 OVS faster than the others.
Measure Ingress Delay First Focus on the ingress delay. # Diff performance among HW switches. OVS faster. Important observation, but will come back to this point later. OVS faster than the others.

14 Measure Egress Delay # Understand why Redis behaves differently across switches  start designing an emulator that captures the ingress and egress delays. In both cases, recall that OVS is faster than HW in control path. Convenient fact. # We can slow down OVS’s ingress and egress time like the hardware. Transform dotted line (OVS) into the colored lines (HW switches). OVS: almost no delays! Slow down the ingress and egress delays on OVS to emulate the hardware

15 Implementing the Emulator
Slow down control traffic. No change to the controller and OVS. Controller (POX) To slow down control traffic Packet-in Events Flow-mod Events Open vSwitch (OVS) Ingress Egress Data plane traffic Application Workload

16 Emulator Proxy # Delayed using Inverse Transform Sampling. Details in paper. Key idea is if HW delays some amt with some prob, then proxy will also delay by the same amt with same prob. # Same technique for egress. # Approximate the control path characteristics of hardware. Controller (POX) Packet-in Events (Delayed) Flow-mod Events Physical OF Switch Emulator Proxy Flow-mod Events (Delayed) Packet-in Events Open vSwitch (OVS) Ingress Egress Data plane traffic Application Workload

17 Evaluation To eval, we incorporated the delay measurments of HP switch into emulator. We plug our emulator into the same test harness. # Compares performance of emulator against HW HP switch. Red: hardware. Blue: emulator. Slow down OVS from dotted line to blue line. Reasonable approximation. Query completion time for Redis clients (ms) CDF HP OVS Hardware Emulated

18 Evaluation We emulate the ingress and egress delays only.
Reasonble approximation. Evaluation # Similarly, we measure the ingress and egress delays on Monaco. Incorporate the measurements into emulator, so OVS would slow down like Monaco. # There is still gap between emul and HW. But remember. We emulated ingress and egress only. Reasonable approx. HP CDF Monaco Query completion time for Redis clients (ms) Quanta Query completion time for Redis clients (ms) OVS Hardware Emulated Query completion time for Redis clients (ms)

19 Summary Future Work Future work: # Recall long list of vendor variations like flow table sizes and flow mgt policies. Capture those. # Give me a new switch. Have to start some workload, measure switch, put measurements into emulator. Great to automate this process so emulator works for other workloads. # Built emulator for one switch. Multiple switches? To capture the complexities in the interactions of switches. Emulate larger cluster. Hard to predict performance due to vendor variations. We designed an emulator for control-path delays. Simple, but achieves reasonable approximation. Increase realism. Capture more artifacts. Expand workload coverage. Automate switch measurements. Capture interactions among multiple switches. Acknowledgements Marco Canini and Dan Levin (TU Berlin) George Porter (UC San Diego)

20 Thank you!

21 Inverse Transform Sampling
Goal: Emulate switch X, which introduces ingress delay t with probability p. Algorithm: Measure the delay distributions of OVS and X. Make them into tables. Measure how much OVS has delayed. Call this tOVS. Look up tOVS from the OVS table. This returns probability p. Look up p from X’s table. This returns delay tX. Introduce delay (tX - tOVS).

22 Evaluation (QQ Plots) HP Monaco Quanta Time on Emulator (ms)
Query completion time on hardware switches (ms)

23 Time Dilation


Download ppt "High-Fidelity Switch Models for SDN Emulation"

Similar presentations


Ads by Google