Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Similar presentations


Presentation on theme: "CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)"— Presentation transcript:

1 CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

2 Wide-Area Network Research  Most research now focused on large-scale systems  Challenges: testing and evaluation  How to perform wide-area tests in a repeatable, reliable manner  ModelNet, Emulab  Challenge: understanding/capturing Internet topologies  Graph characterization: dK-series 2

3  ModelNet  dK Outline 3

4 A Case for Network Emulation  Need a way to test large-scale Internet services Peer-to-peer, overlay networks, novel protocols  Testing in the real world PlanetLab… Results not reproducible or predictable Difficult to deploy and administer research software  Simulation tools Allows control over test environment May miss important system interactions  Emulation Emulators subject application traffic to end-to-end bandwidth constraints, latency, and loss rate of user specified topology Previous implementations not scalable 4

5 ModelNet  A scalable, cluster-based, comprehensive network emulation environment 5

6 Design  User run configurable number of instances of application on Edge Nodes within cluster  Each instance is a Virtual Edge Node (VN)  Each VN has a unique IP address  Edge nodes route traffic through cluster of Core Routers  Equipped with large memories, modified FreeBSD kernels  Core routers route traffic through emulated links or “pipes”  Each pipe has own packet queue and queuing discipline 6

7 ModelNet Phases  Create  Generates a network topology as a graph  From Internet traces, BGP dumps, synthetic topology generators, etc.  Annotate graph with loss rates, failure distributions…  Distillation  Transforms GMLgraph into pipe topology  Assignment  Maps pipe topology to core nodes, distributing emulation load across core nodes  Finding ideal mapping is NP-complete  ModelNet uses greedy k-clusters assignment For k core nodes, randomly select k nodes in distilled topology. Greedily select links from connected component in round robin 7

8 ModelNet Phases  Binding  Multiplex multiple VNs to each physical edge nodes  Bind each physical edge node to a core router  Generate shortest path routes between all VNs and install in core routing tables  Run  Executes target application code on edge nodes 8

9 Inside the Core  Route traffic through emulated “pipes”  Each route is an ordered list of pipes  Packets move through pipes by reference  Routing table requires O(n 2 ) space  Packet Scheduling  When packet arrives, put at tail of first pipe in its route.  Scheduler stores heap of pipes sorted by earliest deadline - exit time for first packet in its queue  Once every clock tick Traverse pipes in heap for packets that are ready to exit Move packets to tail of next pipe or schedule for delivery Calculate new deadlines  Multi-core Configuration  Next pipe in route may be on different machine  If so, core node tunnels packet descriptor to next node 9

10 Scalability Issues  Traffic traversing core is limited by cluster’s physical internal bandwidth  ModelNet must buffer up to full bandwidth-delay product of target network. 250 MB of packet buffer space to carry flows at aggregate bandwidth of 10 GB/s with 200 ms roundtrip latency.  Assumes perfect routing protocol 10

11 Baseline Accuracy  Want to insure that under load, packets are subject to correct end-to-end delays  Used kernel logging to track ModelNet performance and accuracy  Results show that by running ModelNet scheduler at highest kernel priority  Packets are delivered within 1ms of target end-to-end value  Accuracy is maintained up to 100% CPU usage 11

12 Scalability  Additional Cores  Adding core routers allows ModelNet to deliver higher throughput  Communication between core routers introduces overhead. Higher cross-core communication results in less throughput benefit  VN Multiplexing  Higher degrees of multiplexing enable larger network emulation  Inaccuracies introduced due to context switching, scheduling, resource contention, etc 12

13 Accuracy vs. Scalability  Reduce overhead by deviating from target network requirements  Changes should minimally impact application behavior  Ideally, system reports degree and nature of emulation inaccuracy 13

14 Scalability via Distillation  Pure hop-by-hop emulation  Distilled topology is isomorphic to target network  High per packet overhead  End-to-end distillation  Remove all interior network nodes  Collapse each path into single pipe  Latency = sum of latencies along path  Reliability = product of link reliabilities along path  Low per packet overhead  Does not emulate link contention along path 14

15 Time Dilation on Modelnet  The challenge  Need to emulate networks with more resources  E.g. fast CPU (20Ghz), large b/w networks (TB/s)  But only commodity machines available  Solution  Modelnet + time dilation via virtual machines  Run application inside single VMs  Slow down time inside VM  Result: everything looks faster/bigger/fatter More CPU cycles/time, packets/time, disk I/O /time 15

16 How It’s Done  Must isolate VM from outside measures of time  Time based on shared data structure provided by VMM  Scale data structure by a Time Dilation Factor (TDF)  Also scale hardware timer by TDF  How do we scale only some resources?  Slow the others back down!!  Example: speed up network by TDF=10  B/w increases by 10, but delay dec by 10 So inc delay by 10 Virtual Machine Monitor (VMM) Node Mgr Local Admin VM 1 VM 2 VM n … 16

17 ModelNet Summary  ModelNet, antithesis of PlanetLab  Testing of unmodified applications  Reproducible results  Experimentation using broad range of network topologies and characteristics  Large scale experiments (thousands of nodes and gigabits of cross traffic)  Can scale to emulate non-existent resource levels  But what if you want real deployment on-demand?  Emulab / NetBed 17

18 Emulab / NetBed  A shared configuration on-demand testbed  What if you don’t have your own cluster  What if you need to test specific environments/HW?  What if you need this in 5 mins?  Emulab / NetBed  Hardware: 328 PCs, high speed Gb Cisco switches  Software: OS-loader and manager via web interface Wipe all disks, load OS-images, configure routers in <2 mins Reboot and give ssh access 18

19 Emulab Web Interface 19

20  ModelNet  dK Outline 20

21 Importance of Network Topology  Access to real-world network topologies is vital for research  New routing and other protocol design, development, testing, etc.  Analysis: performance of a routing algorithm strongly depends on topology  Generation: empirical estimation of scalability  Network robustness, resilience under attack, worm spreading, etc. 21

22 Network Topology Research 22 Static Topologies DynamicTopologies

23 Trade Secrets 23  Unfortunately, large scale network topologies are often proprietary  Think about BGP  ISPs want to hide their internal topology  Real datasets are rare  Small scale  Out of date  Static (i.e. not dynamic)

24 Towards Synthetic Topologies 24  Question: can we use graph models to capture real network topologies?  Fit a model to a real topology  Use a generator to produce synthetic topologies that are similar, but not identical to the real topology  Benefits  Privacy – synthetic graphs are not proprietary  Randomization – produce an infinite number of stochastic snapshots  Scalable – generator can produce similar topologies of any size

25 Important Topology Metrics  Degree distribution  Clustering  Assortativity  Distance distribution  Betweenness distribution Problems  No way to reproduce most of the important metrics  No guarantee there will not be any other/new metric found important Problems  No way to reproduce most of the important metrics  No guarantee there will not be any other/new metric found important 25

26 The Approach  Look at inter-dependencies among topology characteristics  See if by reproducing most basic, simple, characteristics, we can also reproduce all other characteristics, including practically important  Try to find the characteristic(s) that define all others 26

27 Definition of dK-distributions dK-distributions are degree correlations within simple connected graphs of size d  For example  1K distribution correlations between node degree distribution  2K distribution correlations on joint node degree distribution  3K distribution correlations on clustering coefficient 27

28 An Example of dK  xK is distribution of subgraphs with particular degrees  dK-1 describes node degree distribution  dK-2 describes joint node degree distribution  dK-3 captures clustering coefficient 28 dk-0: average degree=2 dk-1: P(1)=1, P(2)=2, P(3)=1 dk-2: P(1,3)=1, P(2,2)=1, P(2,3)=2 dk-3: P(1,3,2)=2, P(2,2,3)=1 28

29 Nice properties of dK-series  Constructability: we can construct graphs having properties P d (dK-graphs)  Inclusion: if a graph has property P d, then it also has all properties P i, with i < d (dK-graphs are also iK-graphs)  Convergence: the set of graphs having property P n consists only of one element, G itself (dK-graphs converge to G) Guarantees that all (even not yet defined!) graph metrics can be captured by sufficiently high d 29

30 Inclusion and dK-randomness 2K 0K 0K-random 1K Given G 1K-random nK 2K-random 30

31 How Do We Generate Graphs?  A number of different approaches  Stochastic  Pseudograph  Matching  Rewriting  Some are extensible to d=3, others are not  New research proposed d=2.5, to make generation tractible 31

32 Stochastic approach  Classical (Erdos-Renyi) random graphs are 0K-random graph in the stochastic approach  Easily generalizable for any d:  Reproduce the expected value of the dK-distributions by connecting random d-plets of nodes with (conditional) probabilities extracted from G  Best for theory  Worst in practice 32

33 Pseudograph approach  Reproduces dK-distributions exactly  Constructs not necessarily connected pseudographs  Extended for d = 2  Failed to generalize for d > 2: d-sized subgraphs start overlap over edges at d = 3 33

34 Pseudograph details 1K 1. dissolve graph into a random soup of nodes 2. crystallize it back 2K 1. dissolve graph into a random soup of edges 2. crystallize it back k1k1 k2k2 k1k1 k2k2 k3k3 k4k4 k1k1 k1k1 k 1 k 1 -ends 34

35 dK-Randomizing Rewiring  Can generate random graphs from original  Generalizes to any d  But cannot generate desired graph from dK-distributions 35

36 Algorithms  All algorithms deliver consistent results for d = 0  All algorithms, except stochastic(!), deliver consistent results for d = 1 and d = 2  Both rewiring algorithms deliver consistent results for d = 3  Eventual choice  Use pseudograph to construct 1K graphs  Use targeted rewriting to build higher d graphs 36

37 Skitter Scalar Metrics Metric0K1K2K3Kskitter 6.316.346.29 r0-0.24 0.0010.250.290.46 d5.173.113.083.093.12 dd 0.270.40.35 0.37 1 0.20.030.150.1 n-1 1.81.971.851.9 37

38 HOT Scalar Metrics Metric0K1K2K3KHOT 2.472.592.182.10 r-0.05-0.14-0.23-0.22 0.0020.0090.00100 d8.484.416.326.556.81 dd 1.230.720.710.840.57 1 0.010.0340.0050.004 n-1 1.9891.9671.9961.997 38

39 HOT 0K 39 True HOT GraphHOT 0K

40 HOT 1K 40 True HOT GraphHOT 1K

41 HOT 2K 41 True HOT GraphHOT 2K

42 HOT 3K 42 True HOT GraphHOT 3K


Download ppt "CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)"

Similar presentations


Ads by Google