Presentation is loading. Please wait.

Presentation is loading. Please wait.

Infrastructure adapted from Mark Crovella and Balachander Krishnamurthy.

Similar presentations


Presentation on theme: "Infrastructure adapted from Mark Crovella and Balachander Krishnamurthy."— Presentation transcript:

1 Infrastructure adapted from Mark Crovella and Balachander Krishnamurthy

2 Properties to Measure Physical Properties – Links wired, wireless – radio spectrum propagation delay, capacity, packet delay, packet loss, jitter – Devices NAT boxes Firewalls Switches Routers – FIFO queue, active queue management – network measurements typically try to elicit responses from routers » internal mechanisms affect responses – IP aliases, geo-location

3 Properties to Measure Topology Properties – Autonomous System (AS) – Point of Presence (PoP) – Router – Interface

4 Properties to Measure Traffic Properties – Delays Transmission (s/t) Propagation (d/v) Routing (Queuing + Processing + Other) – Losses l n = L n / C n – Throughput (C n -L n )/T – Jitter depends on inter-arrival time – End-to-end connections goodput

5 Challanges

6 Challenges “Poor Observability” Reasons for this: – Core simplicity – Layered architecture – Hidden pieces – Administrative barriers

7 Core Simplicity Keep It Simple Stupid (KISS) design principle End-to-End argument Stateless nature w.r.t connections/flows – no support for measurement – SNMP: considerable overhead even with per-packet/byte counters As network elements do not track packets individually, interaction of traffic with the network is hard to observe

8 Layered Architecture IP hourglass model hides details of lower level layers While this provides abstraction improving interoperability, it impedes detailed visibility of lower layers Even detailed measurements such as packet capture cannot detect differences between link types

9 Hidden Pieces - Middleboxes Firewalls – provide security Traffic Shapers – assist in traffic management Proxies – improve performance NAT boxes – utilize IP address space efficiently Each of these impedes visibility of network components. – firewalls may block active probing requests – NATs hide away the number of hosts and the structure of the network on the other side

10 Administrative Barriers ISPs actively seek to hide details from outside discovery owing to the competition-sensitive nature of the data required – topology, traffic etc. Information that they do provide are often simplified – No cross ISP SNMP – Instead of publishing router-level topologies, ISPs often publish PoP- level topologies

11 Tools

12 Tools Classification Active Measurement Passive Measurement Fused/Combined Measurement Bandwidth Measurement Latency Measurement Geolocation Others

13 Active Measurement Tools Methods that involve adding traffic to the network for the purposes of measurement Ping: Sends ICMP ECHO_REQUEST and captures ECHO_REPLY – Useful for measuring RTTs – Only sender needs to be under experiment control OWAMP: A daemon running on the target which listens for and records probe packets sent by the sender – Useful for measuring one-way delay – Requires both sender and receiver to be under experiment control – Requires synchronized clocks or a method to remove clock offset

14 Traceroute Useful for determining path from a source to a destination Uses the TTL (Time To Live) field in the IP header in a clever but distorted way Large scale measurement systems use traceroute to discover network topology

15 IP Header and the TTL field ver length 32 bits data (variable length, typically a TCP or UDP segment) 16-bit identifier Internet checksum time to live 32 bit source IP address IP protocol version number header length (bytes) max number remaining hops (decremented at each router) for fragmentation/ reassembly total datagram length (bytes) upper layer protocol to deliver payload to head. len type of service “type” of data flgs fragment offset upper layer 32 bit destination IP address Options (if any) E.g. timestamp, record route taken, specify list of routers to visit.

16 TTL normal usage TTL is initialized by the sender and decremented by one each time the packet passes through a router If it reaches zero before reaching the destination, IP protocol requires that the packet be discarded and an error message be sent back to the sender Error message is an ICMP “time exceeded” packet

17 Traceroute Probe packets are carefully constructed to elicit intended response from a probe destination traceroute probes all nodes on a path towards a given destination – TTL-scoped probes obtain ICMP error messages from routers on the path – ICMP messages includes the IP address of intermediate routers as its source Merging end-to-end path traces yields the network map S DABC Destination TTL=1 IP A TTL=2 IP B TTL=3 IP C TTL=4 IP D Vantage Point

18 Traceroute Problem Suppose the path between A and D is to be determined using traceroute A XY D BC

19 Traceroute Process A XY D BC Dest = D TTL = 1 B: “time exceeded”

20 Traceroute Process A XY D BC Dest = D TTL = 2 C: “time exceeded”

21 Traceroute Process A XY D BC Dest = D TTL = 3 D: “echo reply”

22 Traceroute issues Path Asymmetry – Destination -> Source need not retrace Source -> Destination Unstable Paths and False Edges Aliases Measurement Load

23 Unstable Paths and False Edges Inferred path: A -> B -> Y A XY D BC Dest = D TTL = 1 B: “time exceeded” Dest = D TTL = 2 Y: “time exceeded”

24 Probing Direct probing Indirect probing A DBC IP B TTL=64 IP B IP D TTL=64 IP D Vantage Point A DBC IP B IP D TTL=2IP D TTL=1 IP C

25 Aliases IP addresses are for interfaces and not routers Routers typically have many interfaces, each with its own IP address IP addresses of all the router interfaces are aliases Traceroute results require resolution of aliases if they are to be used for topology building

26 Measurement Load Traceroute inserts considerable load on network links if attempting a large-scale topology discovery Optimizations reduce this load considerably – If single source is used, instead of going from source to destination, a better approach is to retrace from destination to source – If multiple sources and multiple destinations are used, sharing information among these would bring down load considerably

27 Other Methods Multicast – Replied by routers along the path Synchronize measurements – Packet loss – Queue size

28 System Support Efficient packet injection and accurate measurement of arrival and departure times are best done at kernel level Unrestricted access to the network interface raises security concerns Using Scriptroute, unprivileged users can inject and capture packets Periscope’s API helps define new probing structures and inference techniques for extracting results from arrival patterns of responses

29 Passive Measurement Methods that capture traffic generated by other users and applications to build the topology Routeview repository collects BGP views (routing tables) from a large set of ASes Similarly, OSPF LSAs can be captured and processed to generate router graphs within an AS

30

31 Passive Measurement: Advantages and Disadvantages Large set of AS-AS, router-router connections can be learned by simply processing captured tables However, especially using BGP views, there could be potential loss of cross-connections between ASes which are along the path Secondly, route aggregation and filtering tends to hide some connections Also, multiple connections between ASes will be shown as a single connection in the graph

32 Internet Topology Mapping

33 Challenges Infrastructural Issues Sampling – Vantage Points and Destination List Probing Overhead – Inter- and Intra-monitor Redundancy Responsiveness of Routers – ICMP, UDP, TCP Load Balancing Routers – Per destination, per flow, per packet 33 Intra-monitor Inter-monitor

34 Topology Collection Internet2 backbone Traces d - H - L - S - e d - H - A - W - N - f e - S - L - H - d e - S - U - K - C - N - f f - N - C - K- H - d f - N - C - K - U - S - e S L U K C H A W N e d f

35 Topology Sampling: Issues Sampling to discover networks – Infer characteristics of the topology Different studies considered – Effect of sample size [Barford 01] – Sampling bias [Lakhina 03] – Path accuracy [Augustin 06] – Sampling approach [Gunes 07] – Utilized protocol [Gunes 08] ICMP echo request TCP syn UDP port unreachable ~ 10% of routers are unresponsive ProtocolResponsiveness ICMP81.9 % TCP67.3 % UDP59.9 %

36 Unresponsive Routers Unresponsive routers do not respond to traceroute probes and appear as  in traceroute output – Same router may appear as  in multiple traces. y: S – L – H – x x: H – L – S – y y: S –  – H – x x: H –  – S – y S L H y x S L H y x y S 11 22 H x Current daily raw topology data sets include ~ 20 million path traces with ~ 20 million occurrences of  s along with ~ 500K public IP addresses The raw topology data is far from representing the underlying sampled network topology

37 Unresponsive Router Resolution Internet2 backbone S L U K C H A W N e d Traces d -  - L - S - e d -  - A - W -  - f e - S - L -  - d e - S - U -  - C -  - f f -  - C -  -  - d f -  - C -  - U - S - e f

38 Unresponsive Router Resolution UKCN LHAW S d e f Sampled network d e f S U L C A W Resulting network Traces d -  - L - S - e d -  - A - W -  - f e - S - L -  - d e - S - U -  - C -  - f f -  - C -  -  - d f -  - C -  - U - S - e

39 Previous Approaches Basic heuristics – IP: Combine anonymous nodes between same known nodes [Bilir 05] Limited resolution – NM: Combine all anonymous neighbors of a known node [Jin 06] High false positives UK C N L HA W S x y z Sampled network x y z S U L C A W After resolution x y z S U L C A W H x y z S U L C A W Resulting network

40 Previous Approaches More theoretic approaches – Graph minimization [Yao 03] Combine  s as long as they do not violate two accuracy conditions: (1) Trace preservation condition and (2) distance preservation condition High complexity O(n 5 ) – n is number of  s – ISOMAP based dimensionality reduction [Jin 06] Build an nxn distance matrix then use ISOMAP to reduce it to a nx5 matrix Distance: (1) hop count or (2) link delay High complexity O(n3) – n is number of nodes – Semisupervised Spectral Clustering [Shavitt 08] A node will not be chosen to be an unknown root if it shares two or more neighbors with an unknown root. Nodes that share two or more neighbors are usually very close to each other, and it is difficult to distinguish between them even manually. After splitting them into unknowns, these nodes will have at least one common unknown node. – This makes the task of cleanly separating the unknowns impossible 40

41 Structural Graph Indexing (SGI) Structural Graph Indexing – A graph data mining technique Index all pre-defined substructures in a graph data Use of SGI for anonymous router resolution – Apply SGI to collected path traces – Merge anonymous routers using identified structures Trace Preservation Condition – Don’t merge anonymous routers within the same trace Subnet distance as tie-breaker 41

42 Common Structures due to ARs 42 A x C y2 A x C Parallel  -substring y1 y3 y1 y3     DA wx C y E z DA wx C y E z Star     A C x y D w F v E z A C x y D w F v E z Complete Bipartite        A C x y D w E z A C x y D w E z Clique       

43 Graph Indexing based Resolution Indexing Phase parallel star bipartite clique Resolution Phase parallel clique bipartite star

44 IP Alias Resolution S L U C N W A s.2 l.1 s.3 u.1 l.3 u.3 h.1 k.3 h.2 a.3 u.2 k.1 c.4 a.1 a.2 w.3 c.3 w.1 c.2 n.1 n.3 w.2 l.2 K c.1 k.2 h.3 d h.4 s.1 e f n.2 H Traces d - h.4 - l.3 - s.2 - e d - h.4 - a.3 - w.3 - n.3 - f e - s.1 - l.1 - h.1 - d e - s.1 - u.1 - k.1 - c.1 - n.1 - f f - n.2 - c.2 - k.2 - h.2 - d f - n.2 - c.2 - k.2 - u.2 - s.3 - e

45 IP Alias Resolution UKCN LHAW S d e f Sampled network Sample map without alias resolution s.3 s.1 s.2 l.3 l.1 u.1 u.2 k.1 c.1n.1 n.2 k.2 c.2 w.3 a.3 h.2 h.4 h.1 e d f n.3 Traces d - h.4 - l.3 - s.2 - e d - h.4 - a.3 - w.3 - n.3 - f e - s.1 - l.1 - h.1 - d e - s.1 - u.1 - k.1 - c.1 - n.1 - f f - n.2 - c.2 - k.2 - h.2 - d f - n.2 - c.2 - k.2 - u.2 - s.3 - e

46 Previous Approaches Dest = A B Dest = B A, ID=100 Dest = B B, ID=99 B, ID=103 A B A B Source IP Address Based Method [Pansiot 98] – Relies on a particular implementation of ICMP error generation. IP Identification Based Method (ally) [Spring 03] – Relies on a particular implementation of IP identifier field, – Many routers ignore direct probes. DNS Based Method [Spring 04] – Relies on similarities in the host name structures sl-bb21-lon-14-0.sprintlink.net sl-bb21-lon-8-0.sprintlink.net – Works when a systematic naming is used. Record Route Based Method [Sherwood 06] – Depends on router support to IP route record processing

47 Analytical Alias Resolution MIT UTD 18.7.21.1 18.168.0.27 129.110.95.1 129.110.5.1 206.223.141.73 192.5.89.89 206.223.141.70 192.5.89.10 198.32.8.34 198.32.8.85 198.32.8.66 198.32.8.65 198.32.8.84 198.32.8.33 192.5.89.9 206.223.141.69 192.5.89.90 206.223.141.74 18.168.0.25 no response 18.7.21.84 no response Aliases 129.110.5.1 - 206.223.141.74 206.223.141.73 - 206.223.141.69 206.223.141.70 - 198.32.8.33 …

48 Analytical & Probe-based Alias Resolution There is possibility of – incorrect subnet assumption Two /30 subnets assumed as a /29 – incorrect alignment of path traces IP 4 and IP 8 are thought of as aliases To prevent false positives, some conditions are defined – Trace preservation – Distance preservation (probing component of APAR) – Completeness – Common neighbor a sample network a cd b ef IP 1 IP 2 IP 9 IP 3 IP 4 IP 8 IP 7

49 Subnet Resolution Alias resolution – IP addresses that belong to the same router Subnet resolution – IP addresses that are connected over the same medium IP2IP3 IP4 IP1 IP6IP5 IP2 IP3 IP1 IP2IP3 IP1

50 Subnet Inference IP2 IP3 IP1 IP2IP3 IP1 (observed topology)(inferred topology)(underlying topology) CD AB CD AB Subnet resolution Identify IP addresses that are connected over the same medium Improve the quality of resulting topology map CD AB CD AB

51 Subnet Inference Approach 51 129.110.1.1 129.110.1.2 129.110.2.0 129.110.2.1 129.110.4.1 129.110.4.83 129.110.4.217 129.110.12.1 129.110.12.2 129.110.12.6 129.110.17.1 129.110.17.135 129.110.219.1 V.P. /30 /31 /24 /28 /29 129.110.4.0/24 129.110.6.0/28 129.110.17.0/24 129.110.12.0/29 129.110.219.0/24 129.110.1.0/30 129.110.2.0/31 23342124554532334212455453 129.110.2.0/30 129.110.4.0/24 129.110.12.0/29 129.110.17.0/24 129.110.0.0/16 129.110.1.0/31

52 Subnet Inference Approach Inferring Subnets Cluster IP addresses into maximal subnets up to a given size (e.g. /22) Distance analysis on candidate subnets to break them down as necessary IP1 IP2 IP3 IP4 IP5 IP6 IP7 IP8 IP9 Completeness: Ignore candidate subnets that have less than one quarter of their IP addresses present after additional probing /25 /29 /26 /30 /31 /27 A /27 subnet can have up to 2 5 IP addresses. /22 52

53 Inference with Distance Matrix Obtain distance of each IP from multiple vantage points (VP) Only one IP at a subnet might be at a distance ‘hop-1’ per VP IPs after per-destination and per-packet load-balancers – Get minimum hop (seen at any Paris Traceroute) of an IP per VP – IP hops after a LB has lower trust Two rounds of computations Compensate for diamond asymmetry if per-destination LB VP:12345…672 IP105400…7 IP200350…7 IP325040…6 …

54 Geolocation Given the network address of a target host, what is the host’s geographic location ? The answer to this is useful for a wide variety of social, economic and engineering purposes The actual location of network infrastructure sheds light on how it relates to population, social organization and economic activity

55 Geolocation methods Name Based Geolocation – Extracting location details from ISPs domain names Location Databases Delay Based Geolocation – Best Landmark – Constraint-based

56 Landmark based geolocation In best landmark approach, minRTT between each of the identified landmarks is measured and stored. Then the same metric is calculated between the node in question and each of the landmarks. The landmark with the best matching values of minRTT is the closest to the node

57 Constraint based geolocation The distances of target location from sufficient number of fixed points are calculated and using multilateration – Used in GPS – However, Internet delay is affected by many factors

58 Network Traffic Measurements

59 Bandwidth Measurement Bandwidth – amount of data the network can transmit per unit time Streaming media applications, server selection, overlay networks etc. require ways to measure bandwidth Three kinds of bandwidth – – capacity: max throughput a link can sustain – available bandwidth: residual capacity – bulk transfer capacity: rate that a new single long-lived TCP connection would obtain over a path

60 Bandwidth Measurement Methods These focus on observing how packet delay (especially queuing and transmission) is affected by link properties Four types: Packet-pair Methods Size-delay Methods Self-induced Congestion Bulk Transfer Capacity Measurement

61 Packet-Pair Methods Methods to measure capacity and available bandwidth Involve sending probe packets with known inter-packet gaps and measuring the same gap downstream Capacity is calculated using the eqn: C = L / max delta, where C is the capacity, L is the length of probe packets, max delta is the maximum inter-packet gap measured downstream

62 Packet-Pair Methods Assumes there is no cross-traffic Ensure packets queue for service at bottleneck link Similar approach to measure available bandwidth

63 Size Delay Methods Useful for measuring link capacities on each link along a path Based on the observation that transmission delay is affected by link capacity and packet size The idea is to send many different sized packets and measure the difference in delays affected by packet size. Then the capacity of each link will be a function of these differences Method assumes there is no cross-traffic, no queuing delays, no variation in packet size Measurements become less accurate if the length of the path grows

64 Self-induced Congestion Find minimum probe rate that creates congestion on the path Use packet trains at different rates to observe delay increases Assumes cross traffic has fixed rate – yet Internet traffic is bursty Probing traffic is large enough to affect cross-traffic – Change the measured quantity

65 Bulk Transfer Capacity Measurement Transfer as much data as possible Need to ensure destination does not affect results – buffer capacity Cross traffic affects the results – TCP backoff – UDP ???

66 Caveats in Bandwidth Measurements High rate links like OC-192 make it difficult to measure bandwidth accurately because of small delays Wireless links affect rate dramatically on fine timescales FIFO order is not guaranteed in wireless links Layer 2 devices can cause underestimation of a IP hop’s capacity by introducing additional transmission delays – Not all links are point-to-point

67 Latency Estimation

68 Network Tomography A process of inferring network topology, delays, packet losses etc. using only end-to-end measurements One needs to make many assumptions about the behavior of the underlying network

69 Network Tomography: Multicast based Multicast based method e.g. to figure out the loss rates

70 Internet Measurements are anything but straightforward… Internet Measurement is key to designing the next generation communication network Fundamental design principles of the current internet make it harder for measuring various aspects of it Preliminary research has resulted in a set of basic tools and methods to measure aspects like topology, traffic etc. Accuracy of such methods is still an open question There is still a lot of ground to cover in this direction and this is where researchers like you come into the equation!


Download ppt "Infrastructure adapted from Mark Crovella and Balachander Krishnamurthy."

Similar presentations


Ads by Google