Faithful Reproduction of Network Experiments Dimosthenis Pediaditakis Charalampos Rotsos Andrew W. Moore Computer Laboratory, Systems Research Group University of Cambridge, UK
ANCS 2014, Marina del Rey, Califoria, USA2 100 Mbps 1 GbE Research on networked systems: Yesterday
ANCS 2014, Marina del Rey, Califoria, USA3 Research on networked systems: Modern era 1 GbE 10 GbE WAN link: 40++ Gbps How we evaluate new ideas ?
Simulation (ns3): Too much abstraction ANCS 2014, Marina del Rey, Califoria, USA4 Fat-Tree 8x clients 12x switches 1 GbE links 8 Gbps aggregeate Ns3 – Flat model – 2.75x lower throughput
Emulation (MiniNet): Poor scalability Identical experiment setup MiniNet – Out of CPU cycles 4.5x lower throughput performance artifacts ANCS 2014, Marina del Rey, Califoria, USA5
Everything is a trade-off ANCS 2014, Marina del Rey, Califoria, USA6 Fidelity Scalability Reproducibility Emulation: Sacrifice scalability Emulation: Sacrifice scalability Simulation: Sacrifice fidelity Simulation: Sacrifice fidelity Natural for simulation Emulation – MiniNet is the pioneer – How to maintain across different platforms ??
SIMULATIONEMULATION SELENA HYBRIDTESTBEDS Reproducibility Real Net Stacks Unmodified App Hardware Req. Scalability Fidelity Exec. speed SELENA: Standing on the shoulders of giants Fidelity: Emulation, Xen, real OS components Reproducibility: MiniNet approach Scalability: Time dilation (DieCast approach) ANCS 2014, Marina del Rey, Califoria, USA 7 Full user control: Trade execution speed for fidelity and scalability
API and experimental workflow ANCS 2014, Marina del Rey, Califoria, USA8 Experiment description Python API Selena compiler Selena compiler
SELENA’s Emulation model over Xen ANCS 2014, Marina del Rey, Califoria, USA9 OVS Bridge
The concept of Time-Dilation ANCS 2014, Marina del Rey, Califoria, USA10 I command you to slow down 1 tick = (1/C_Hz) seconds Real Time 10 Mbits data Real time rate REAL = 10 / (6*C_Hz) Mbps 2x Dilated time (TDF = 2) (tick rate)/2, C_Hz tick rate, 2*C_Hz OR Virtual time 10 Mbits data rate VIRT = 10 / (3*C_Hz) Mbps = 2 * rate REAL
Scaling resources via Time Dilation STEP 1: Create a scenario STEP 2: Choose a time dilation factor (TDF) – Linear and symmetric scaling of all resources Network, CPU, ram BW, disk I/O STEP 3: Control independently the “perceived” available resources – Configure via SELENA’s API independently CPU (Xen Credit2) Network (Xen VIF QoS, netem) Disk I/O (in guests via cgroups) ANCS 2014, Marina del Rey, Califoria, USA11
Xen PV-guest Time-Keeping ANCS 2014, Marina del Rey, Califoria, USA12 XEN Hypervisor rdtsc VIRQ_TIMER Hypervisor_set_timer_op XEN Clock Source TSC value XEN VIRQ set next event Time – Wall clock time (epoch) – System time (boot) – Independent mode rdtsc modes of operation – Native – Emulated Scheduled timers Periodic timers Loop delays
Implementing Time-Dilation ANCS 2014, Marina del Rey, Califoria, USA13 Linux Guest Xen Hypervisor Periodic VIRQ_TIMER is not really used TSC value Trap – Emulate - scale “rdtsc” Native “rdtsc” (constant, invariant) - Start-of-day: dilated wallclock time - VPCU time: _u.tsc_timestamp = tsc_stamp; _u.system_time = system_time; _u.tsc_to_system_mul = tsc_to_system_mul; VCPUOP_set_singleshot_timer set_timer(&v->singleshot_timer, dilatedTimeout); Periodic VIRQ_TIMER implemented (but is not really used)
Summarizing the elements of Fidelity Resource scaling via time dilation Real Stacks and other OS components Real Applications – Including SDN controllers Realistic SDN switch models – Why is it important ? – How it affects performance ? ANCS 2014, Marina del Rey, California, USA14
OpenFlow Switch X-Ray ANCS 2014, Marina del Rey, Califoria, USA15 Network OS ASIC OF Agent Control App Control App Control App Control App Control Channel Available capacity, synchronicity PCI bus capacity is limited in comparison to data plane ASIC driver affects how fast the policy is configured in the ASIC - Scarce co-processor resources - Switch OS scheduling is non-trivial Control application complexity Control plane performance is critical for the data plane
Building an OpenFlow switch model Pica8 P-3290 switch – Measure message processing performance (OFLOPS) – Extract latency characteristics of: flow table management the packet interception / injection mechanism counters extraction Configurable switch model – Replicate latency and loss characteristics – Implementation: Mirage-OS based switch Flexible, functional, non-bloated code Performance: uni-kernel Small footprint: scalable emulations ANCS 2014, Marina del Rey, Califoria, USA16
Evaluation methodology 1.Run experiment on real hardware 2.Reproduce results in: 1.MiniNet 2.NS3 3.SELENA (for various TDF) 3.Compare against “real” ANCS 2014, Marina del Rey, California, USA17
MiniNet and Ns Gbps and 5.2Gbps SELENA - 10x dilation: 99.5% accuracy - executes 9x faster than Ns3 Throughput fidelity ANCS 2014, Marina del Rey, Califoria, USA18
Latency fidelity ANCS 2014, Marina del Rey, Califoria, USA19 Setup - 18 nodes, 1Gbps links flows MiniNet &Ns3 accuracy: 32% and 44% Selena accuracy 71% with 5x dilation 98.7% with 20x dilation
SDN Control plane Fidelity ANCS 2014, Marina del Rey, Califoria, USA20 1Mb TCP flows completion time exponential arrival λ = 0.02 Stepping behavior: - TCP SYN & SYNACK loss Mininet switch model: - does not capture this throttling effect Stepping behavior: - TCP SYN & SYNACK loss Mininet switch model: - does not capture this throttling effect The model is not capable to capture transient switch OS scheduling effects.
Application fidelity (LAMP) ANCS 2014, Marina del Rey, Califoria, USA21 Fat-Tree CLOS – 1 Gbps links – 10x switches – 4x Clients – 4x WebServers: Apache2, PHP, MySQL, Redis, Wordpress
A layered SDN controller hierarchy ANCS 2014, Marina del Rey, Califoria, USA22 4 pod, Fat-Tree topology, 1GbE links 32 Gbps aggregate traffic The layered control-plane architecture Question: How does a layered controller hierarchy affect performance ? 1 st Layer Controller2 nd Layer Controller More layers – Control decisions taken higher in the hierarchy – Flow setup latency increases Network, Request pipelining, CPU load – Resilience
Scalability analysis Fat-Tree topology, 1 GbE links, multi Gbit sink link Domain-0 is allocated 4-cores – Why tops at 250% CPU utilisation ? Near linear scalability ANCS 2014, Marina del Rey, Califoria, USA23 OVS Bridge
How to (not) use SELENA SELENA is primarily a NETWORK emulation framework – Perfect match: network bound applications – Provides tuning knobs to experiment with: CPU, disk I/O and Network relative performance Real applications / SDN controllers / network stacks Time dilation is not a panacea – Device-specific Disk IO performance – Cache thrashing and data locality – Multi-core effects (e.g. per-core lock contention) – Hardware features (e.g. Intel DDIO) – Scheduling effects of Xen at scale (100s of VMs) Rule of thumb for choosing TDF – Low Dom-0 and Dom-U utilisation – Observation time-scales matter ANCS 2014, Marina del Rey, Califoria, USA24
Work in progress API compatibility with MiniNet Further improve scalability - Multi-machine emulation - Optimize guest-2-guest Xen communications Features and use cases – SDN coupling with workload consolidation – Emulation of live VM migration – Incorporate energy models ANCS 2014, Marina del Rey, California, USA25
SELENA is free and open. Give it a try: ANCS 2014, Marina del Rey, California, USA26
ANCS 2014, Marina del Rey, Califoria, USA27
ANCS 2014, Marina del Rey, Califoria, USA28
ANCS 2014, Marina del Rey, Califoria, USA29
ANCS 2014, Marina del Rey, Califoria, USA30
ANCS 2014, Marina del Rey, Califoria, USA31
Research on networked systems: past, present, future Animation: 3 examples of networks. Examples will show the evolution of “network-characteristics” on which research is conducted: – Past: 2-3 Layers, Hierarchical, TOR, 100Mbps, bare metal OS – Present: Fat-tree, 1Gbps links, Virtualization, WAN links – Near future: Flexible architectures, 10Gbps, Elastic resource management, SDN controllers, OF switches, large scale (DC), The point of this slide is that real-world systems progress at a fast pace (complexity, size) but common tools have not kept up with this pace I will challenge the audience to think: – Which of the 3 examples of illustrated networks they believe they can model with existing tools – What level of fidelity (incl. Protocols, SDN, Apps, Net emulation) – What are the common sized and link speeds they can model ANCS 2014, Marina del Rey, California, USA32
A simple example with NS-3 Here I will assume a simple star-topology 10x clients, 1x server, 1x switch (10Gbps aggregate) I will provide the throughput plot and explain why performance sucks Point out that NS3 is not appropriate for faster networks Simplicity of models + non real applications Using DCE: even slower, non full POSIX- compliant ANCS 2014, Marina del Rey, California, USA33
A simple example with MiniNet Same as before Throughput plot Better fidelity in terms of protocols, applications etc – Penalty in performance Explain what is the bottleneck, especially in relation to MiniNet’s implementation ANCS 2014, Marina del Rey, California, USA34
Everything is a trade-off Nothing comes for free when it comes to modelling and the 3 key- experimentation properties MiniNet aims for fidelity – Sacrifices scalability NS-3 aims for scalability (many abstractions) – Sacrifices fidelity, +scalability limitations The importance of Reproducibility – MiniNet is a pioneer – difficult to maintain from machine to machine MiniNet cannot guarantee that at the level of performance, only at the level of configuration ANCS 2014, Marina del Rey, California, USA35 Fidelity Scalability Reproducibility
SELENA: Standing on the shoulders of giants Fidelity: use Emulation – Unmodified apps and protocols: fidelity + usability – XEN: Support for common OS, good scalability, great control on resources Reproducible experiments – MiniNet approach, high-level experiment descriptions, automation Maintain fidelity under scale – DieCast approach: time dilation (will talk more later on that) The user is the MASTER: – Tuning knob: Experiment Execution speed ANCS 2014, Marina del Rey, California, USA36
SELENA Architecture Animation here: 3 steps show how an experiment is – Specified (python API) – compiled – deployed Explain mappings of network entities-features to Xen emulation components Give hints of optimization tweaks we use under the hood ANCS 2014, Marina del Rey, California, USA37 Experiment description Python API Selena compiler Selena compiler
Time Dilation and Reproducibility Explain how time dilation also FACILITATES reproducibility across different platforms Reproducibility – Replication of configuration Network architecture, links, protocols Applications Traffic / workloads How we do it in SELENA: Python API, XEN API – Reproduction of results and observed performance Each platform should have enough resources to rund faithfully the experiment How we do it in SELENA: time dilation – An older platform/hardware will require a different minimum TDF to reproduce the same results ANCS 2014, Marina del Rey, California, USA38
Demystifying Time-Dilation 1/3 Explain the concept in high-level terms – Give a solid example with a timeline Similar to slide 8: dilation/nsdi06-tdf-talk.pdfhttp://sysnet.ucsd.edu/projects/time- dilation/nsdi06-tdf-talk.pdf Explain that everything happens at the H/V level – Guest time sandboxing (experiment VMs) – Common time for kernel + user space – No modifications for PV guests Linux, FreeBSD, ClickOS, OSv, Mirage ANCS 2014, Marina del Rey, California, USA39
Demystifying Time-Dilation 2/3 Here we explain the low-level staff Give credits to DieCast, but also explain the incremental work we did Best to show/explain with an animation ANCS 2014, Marina del Rey, California, USA40
Demystifying Time-Dilation 3/3 Resources scaling – Linear and symmetric scaling for Network, CPU, ram BW, disk I/O – TDF only increases the perceived performance headroom of the above – SELENA allows for configuring independently the perceived speeds of CPU Network Disk I/O (from within the guests at the moment -- cgroups) Typical workflow 1.Create a scenario 2.Decide the minimum necessary TDF for supporting the desired (will see more later on that) 3.Independently scale resources, based on the requirements of the users and the focus of their studies ANCS 2014, Marina del Rey, California, USA41
Summarizing the elements of Fidelity Resource scaling via time dilation (already covered) Real Stacks and other OS components Real Applications – Including SDN controllers Realistic SDN switch models – Why is it important – How much can it affect observed behaviours ANCS 2014, Marina del Rey, California, USA42
Inside an OF switch Present a model of an OF switch internals – Show components – Show paths / interactions which affect performance Data plane (we do not model that currently) Control plane ANCS 2014, Marina del Rey, California, USA43 Random image from the web. Just a placeholder
Building a realistic OF switch model Methodology for constructing an empirical model – PICA-8 – OFLOPS measurements Collect, analyze, extract trends Stochastic model – Use a mirage-switch to implement the model Flexible, functional, non-bloated code Performant: uni-kernel, no context switches Small footprint: scalable emulations ANCS 2014, Marina del Rey, California, USA44
Evaluation methodology 1.Run experiment on real hardware 2.Reproduce results in: 1.MiniNet 2.NS3 3.SELENA (for various TDF) 3.Compare each one against “real” We evaluate multiple aspects of fidelity: – Data-Plane – Flow-level – SDN Control – Application ANCS 2014, Marina del Rey, California, USA45
Data-Plane fidelity Figure from paper Explain Star-topology Show comparison of MiniNet + NS3 – Same figures from slides 2+3 but now compared against Selena + real Point out how increasing TDF affects fidelity ANCS 2014, Marina del Rey, California, USA46
Flow-Level fidelity Figure from paper Explain Fat-tree topology ANCS 2014, Marina del Rey, Califorina, USA47
Execution Speed Compare against NS3, MiniNet Point out that SELENA executes faster than NS3 – NS3 however replicates only half speed network Therefore difference is even bigger ANCS 2014, Marina del Rey, California, USA48
SDN Control plane Fidelity Figure from paper Explain experiment setup Point out shortcomings of MiniNet – As good as OVS is Point out terrible support for SDN by NS3 ANCS 2014, Marina del Rey, California, USA49
Application level fidelity Figure from paper Explain the experiment setup Latency aspect Show how CPU utilisation matters for fidelity – Open the dialogue for the performance bottlenecks and limitations and make a smooth transition to the next slide ANCS 2014, Marina del Rey, California, USA50
Near-linear Scalability Figure from paper Explain how is scalability determined for a given TDF ANCS 2014, Marina del Rey, California, USA51
Limitations discussion Explain the effects of running on Xen Explain what happens if TDF is low and utilisation is high Explain that insufficient CPU compromises – Emulated network speeds – Capability of guests to utilise the available bandwidth – Skews the performance of networked applications – Adds excessive latency Scheduling also contributes ANCS 2014, Marina del Rey, California, USA52
A more complicated example Showcase the power of SELENA :P Use the MRC2 experiment ANCS 2014, Marina del Rey, California, USA53
Work in progress API compatibility with MiniNet Further improve scalability - Multi-machine emulation - Optimize guest-2-guest Xen communications Features and use cases – SDN coupling with workload consolidation – Emulation of live VM migration – Incorporate energy models ANCS 2014, Marina del Rey, California, USA54
SELENA is free and open. Give it a try: ANCS 2014, Marina del Rey, California, USA55