Download presentation
Presentation is loading. Please wait.
Published byCecil Davidson Modified over 9 years ago
1
Faithful Reproduction of Network Experiments Dimosthenis Pediaditakis Charalampos Rotsos Andrew W. Moore firstname.lastname@cl.cam.ac.uk Computer Laboratory, Systems Research Group University of Cambridge, UK http://selena-project.github.io
2
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA2 100 Mbps 1 GbE Research on networked systems: Yesterday
3
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA3 Research on networked systems: Modern era 1 GbE 10 GbE WAN link: 40++ Gbps How we evaluate new ideas ?
4
Simulation (ns3): Too much abstraction http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA4 Fat-Tree 8x clients 12x switches 1 GbE links 8 Gbps aggregeate Ns3 – Flat model – 2.75x lower throughput
5
Emulation (MiniNet): Poor scalability Identical experiment setup MiniNet – Out of CPU cycles 4.5x lower throughput performance artifacts http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA5
6
Everything is a trade-off http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA6 Fidelity Scalability Reproducibility Emulation: Sacrifice scalability Emulation: Sacrifice scalability Simulation: Sacrifice fidelity Simulation: Sacrifice fidelity Natural for simulation Emulation – MiniNet is the pioneer – How to maintain across different platforms ??
7
SIMULATIONEMULATION SELENA HYBRIDTESTBEDS Reproducibility Real Net Stacks Unmodified App Hardware Req. Scalability Fidelity Exec. speed SELENA: Standing on the shoulders of giants Fidelity: Emulation, Xen, real OS components Reproducibility: MiniNet approach Scalability: Time dilation (DieCast approach) http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 7 Full user control: Trade execution speed for fidelity and scalability
8
API and experimental workflow http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA8 Experiment description Python API Selena compiler Selena compiler
9
SELENA’s Emulation model over Xen http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA9 OVS Bridge
10
The concept of Time-Dilation http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA10 I command you to slow down 1 tick = (1/C_Hz) seconds Real Time 10 Mbits data Real time rate REAL = 10 / (6*C_Hz) Mbps 2x Dilated time (TDF = 2) (tick rate)/2, C_Hz tick rate, 2*C_Hz OR Virtual time 10 Mbits data rate VIRT = 10 / (3*C_Hz) Mbps = 2 * rate REAL
11
Scaling resources via Time Dilation STEP 1: Create a scenario STEP 2: Choose a time dilation factor (TDF) – Linear and symmetric scaling of all resources Network, CPU, ram BW, disk I/O STEP 3: Control independently the “perceived” available resources – Configure via SELENA’s API independently CPU (Xen Credit2) Network (Xen VIF QoS, netem) Disk I/O (in guests via cgroups) http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA11
12
Xen PV-guest Time-Keeping http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA12 XEN Hypervisor rdtsc VIRQ_TIMER Hypervisor_set_timer_op XEN Clock Source TSC value XEN VIRQ set next event Time – Wall clock time (epoch) – System time (boot) – Independent mode rdtsc modes of operation – Native – Emulated Scheduled timers Periodic timers Loop delays
13
Implementing Time-Dilation http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA13 Linux Guest Xen Hypervisor Periodic VIRQ_TIMER is not really used TSC value Trap – Emulate - scale “rdtsc” Native “rdtsc” (constant, invariant) - Start-of-day: dilated wallclock time - VPCU time: _u.tsc_timestamp = tsc_stamp; _u.system_time = system_time; _u.tsc_to_system_mul = tsc_to_system_mul; VCPUOP_set_singleshot_timer set_timer(&v->singleshot_timer, dilatedTimeout); Periodic VIRQ_TIMER implemented (but is not really used)
14
Summarizing the elements of Fidelity Resource scaling via time dilation Real Stacks and other OS components Real Applications – Including SDN controllers Realistic SDN switch models – Why is it important ? – How it affects performance ? http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA14
15
OpenFlow Switch X-Ray http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA15 Network OS ASIC OF Agent Control App Control App Control App Control App Control Channel Available capacity, synchronicity PCI bus capacity is limited in comparison to data plane ASIC driver affects how fast the policy is configured in the ASIC - Scarce co-processor resources - Switch OS scheduling is non-trivial Control application complexity Control plane performance is critical for the data plane
16
Building an OpenFlow switch model Pica8 P-3290 switch – Measure message processing performance (OFLOPS) – Extract latency characteristics of: flow table management the packet interception / injection mechanism counters extraction Configurable switch model – Replicate latency and loss characteristics – Implementation: Mirage-OS based switch Flexible, functional, non-bloated code Performance: uni-kernel Small footprint: scalable emulations http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA16
17
Evaluation methodology 1.Run experiment on real hardware 2.Reproduce results in: 1.MiniNet 2.NS3 3.SELENA (for various TDF) 3.Compare against “real” http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA17
18
MiniNet and Ns3 - 2.7Gbps and 5.2Gbps SELENA - 10x dilation: 99.5% accuracy - executes 9x faster than Ns3 Throughput fidelity http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA18
19
Latency fidelity http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA19 Setup - 18 nodes, 1Gbps links 10000 flows MiniNet &Ns3 accuracy: 32% and 44% Selena accuracy 71% with 5x dilation 98.7% with 20x dilation
20
SDN Control plane Fidelity http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA20 1Mb TCP flows completion time exponential arrival λ = 0.02 Stepping behavior: - TCP SYN & SYNACK loss Mininet switch model: - does not capture this throttling effect Stepping behavior: - TCP SYN & SYNACK loss Mininet switch model: - does not capture this throttling effect The model is not capable to capture transient switch OS scheduling effects.
21
Application fidelity (LAMP) http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA21 Fat-Tree CLOS – 1 Gbps links – 10x switches – 4x Clients – 4x WebServers: Apache2, PHP, MySQL, Redis, Wordpress
22
A layered SDN controller hierarchy http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA22 4 pod, Fat-Tree topology, 1GbE links 32 Gbps aggregate traffic The layered control-plane architecture Question: How does a layered controller hierarchy affect performance ? 1 st Layer Controller2 nd Layer Controller More layers – Control decisions taken higher in the hierarchy – Flow setup latency increases Network, Request pipelining, CPU load – Resilience
23
Scalability analysis Fat-Tree topology, 1 GbE links, multi Gbit sink link Domain-0 is allocated 4-cores – Why tops at 250% CPU utilisation ? Near linear scalability http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA23 OVS Bridge
24
How to (not) use SELENA SELENA is primarily a NETWORK emulation framework – Perfect match: network bound applications – Provides tuning knobs to experiment with: CPU, disk I/O and Network relative performance Real applications / SDN controllers / network stacks Time dilation is not a panacea – Device-specific Disk IO performance – Cache thrashing and data locality – Multi-core effects (e.g. per-core lock contention) – Hardware features (e.g. Intel DDIO) – Scheduling effects of Xen at scale (100s of VMs) Rule of thumb for choosing TDF – Low Dom-0 and Dom-U utilisation – Observation time-scales matter http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA24
25
Work in progress API compatibility with MiniNet Further improve scalability - Multi-machine emulation - Optimize guest-2-guest Xen communications Features and use cases – SDN coupling with workload consolidation – Emulation of live VM migration – Incorporate energy models http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA25
26
SELENA is free and open. Give it a try: http://selena-project.github.iohttp://selena-project.github.io http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA26
27
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA27
28
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA28
29
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA29
30
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA30
31
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA31
32
Research on networked systems: past, present, future Animation: 3 examples of networks. Examples will show the evolution of “network-characteristics” on which research is conducted: – Past: 2-3 Layers, Hierarchical, TOR, 100Mbps, bare metal OS – Present: Fat-tree, 1Gbps links, Virtualization, WAN links – Near future: Flexible architectures, 10Gbps, Elastic resource management, SDN controllers, OF switches, large scale (DC), The point of this slide is that real-world systems progress at a fast pace (complexity, size) but common tools have not kept up with this pace I will challenge the audience to think: – Which of the 3 examples of illustrated networks they believe they can model with existing tools – What level of fidelity (incl. Protocols, SDN, Apps, Net emulation) – What are the common sized and link speeds they can model http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA32
33
A simple example with NS-3 Here I will assume a simple star-topology 10x clients, 1x server, 1x switch (10Gbps aggregate) I will provide the throughput plot and explain why performance sucks Point out that NS3 is not appropriate for faster networks Simplicity of models + non real applications Using DCE: even slower, non full POSIX- compliant http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA33
34
A simple example with MiniNet Same as before Throughput plot Better fidelity in terms of protocols, applications etc – Penalty in performance Explain what is the bottleneck, especially in relation to MiniNet’s implementation http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA34
35
Everything is a trade-off Nothing comes for free when it comes to modelling and the 3 key- experimentation properties MiniNet aims for fidelity – Sacrifices scalability NS-3 aims for scalability (many abstractions) – Sacrifices fidelity, +scalability limitations The importance of Reproducibility – MiniNet is a pioneer – difficult to maintain from machine to machine MiniNet cannot guarantee that at the level of performance, only at the level of configuration http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA35 Fidelity Scalability Reproducibility
36
SELENA: Standing on the shoulders of giants Fidelity: use Emulation – Unmodified apps and protocols: fidelity + usability – XEN: Support for common OS, good scalability, great control on resources Reproducible experiments – MiniNet approach, high-level experiment descriptions, automation Maintain fidelity under scale – DieCast approach: time dilation (will talk more later on that) The user is the MASTER: – Tuning knob: Experiment Execution speed http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA36
37
SELENA Architecture Animation here: 3 steps show how an experiment is – Specified (python API) – compiled – deployed Explain mappings of network entities-features to Xen emulation components Give hints of optimization tweaks we use under the hood http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA37 Experiment description Python API Selena compiler Selena compiler
38
Time Dilation and Reproducibility Explain how time dilation also FACILITATES reproducibility across different platforms Reproducibility – Replication of configuration Network architecture, links, protocols Applications Traffic / workloads How we do it in SELENA: Python API, XEN API – Reproduction of results and observed performance Each platform should have enough resources to rund faithfully the experiment How we do it in SELENA: time dilation – An older platform/hardware will require a different minimum TDF to reproduce the same results http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA38
39
Demystifying Time-Dilation 1/3 Explain the concept in high-level terms – Give a solid example with a timeline Similar to slide 8: http://sysnet.ucsd.edu/projects/time- dilation/nsdi06-tdf-talk.pdfhttp://sysnet.ucsd.edu/projects/time- dilation/nsdi06-tdf-talk.pdf Explain that everything happens at the H/V level – Guest time sandboxing (experiment VMs) – Common time for kernel + user space – No modifications for PV guests Linux, FreeBSD, ClickOS, OSv, Mirage http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA39
40
Demystifying Time-Dilation 2/3 Here we explain the low-level staff Give credits to DieCast, but also explain the incremental work we did Best to show/explain with an animation http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA40
41
Demystifying Time-Dilation 3/3 Resources scaling – Linear and symmetric scaling for Network, CPU, ram BW, disk I/O – TDF only increases the perceived performance headroom of the above – SELENA allows for configuring independently the perceived speeds of CPU Network Disk I/O (from within the guests at the moment -- cgroups) Typical workflow 1.Create a scenario 2.Decide the minimum necessary TDF for supporting the desired (will see more later on that) 3.Independently scale resources, based on the requirements of the users and the focus of their studies http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA41
42
Summarizing the elements of Fidelity Resource scaling via time dilation (already covered) Real Stacks and other OS components Real Applications – Including SDN controllers Realistic SDN switch models – Why is it important – How much can it affect observed behaviours http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA42
43
Inside an OF switch Present a model of an OF switch internals – Show components – Show paths / interactions which affect performance Data plane (we do not model that currently) Control plane http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA43 Random image from the web. Just a placeholder
44
Building a realistic OF switch model Methodology for constructing an empirical model – PICA-8 – OFLOPS measurements Collect, analyze, extract trends Stochastic model – Use a mirage-switch to implement the model Flexible, functional, non-bloated code Performant: uni-kernel, no context switches Small footprint: scalable emulations http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA44
45
Evaluation methodology 1.Run experiment on real hardware 2.Reproduce results in: 1.MiniNet 2.NS3 3.SELENA (for various TDF) 3.Compare each one against “real” We evaluate multiple aspects of fidelity: – Data-Plane – Flow-level – SDN Control – Application http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA45
46
Data-Plane fidelity Figure from paper Explain Star-topology Show comparison of MiniNet + NS3 – Same figures from slides 2+3 but now compared against Selena + real Point out how increasing TDF affects fidelity http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA46
47
Flow-Level fidelity Figure from paper Explain Fat-tree topology http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califorina, USA47
48
Execution Speed Compare against NS3, MiniNet Point out that SELENA executes faster than NS3 – NS3 however replicates only half speed network Therefore difference is even bigger http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA48
49
SDN Control plane Fidelity Figure from paper Explain experiment setup Point out shortcomings of MiniNet – As good as OVS is Point out terrible support for SDN by NS3 http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA49
50
Application level fidelity Figure from paper Explain the experiment setup Latency aspect Show how CPU utilisation matters for fidelity – Open the dialogue for the performance bottlenecks and limitations and make a smooth transition to the next slide http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA50
51
Near-linear Scalability Figure from paper Explain how is scalability determined for a given TDF http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA51
52
Limitations discussion Explain the effects of running on Xen Explain what happens if TDF is low and utilisation is high Explain that insufficient CPU compromises – Emulated network speeds – Capability of guests to utilise the available bandwidth – Skews the performance of networked applications – Adds excessive latency Scheduling also contributes http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA52
53
A more complicated example Showcase the power of SELENA :P Use the MRC2 experiment http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA53
54
Work in progress API compatibility with MiniNet Further improve scalability - Multi-machine emulation - Optimize guest-2-guest Xen communications Features and use cases – SDN coupling with workload consolidation – Emulation of live VM migration – Incorporate energy models http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA54
55
SELENA is free and open. Give it a try: - http://selena-project.github.iohttp://selena-project.github.io http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA55
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.