Internet Measurements

Slides:



Advertisements
Similar presentations
University of Nevada, Reno Router-level Internet Topology Mapping CS790 Presentation Modified from Dr. Gunes slides by Talha OZ.
Advertisements

1 Network Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Internet Topology Mapping
Internet Control Message Protocol (ICMP)
The Network Layer Chapter 5. The IP Protocol The IPv4 (Internet Protocol) header.
1 A survey of Internet Topology Discovery. 2 Outline Motivations Internet topology IP Interface Level Router Level AS Level PoP Level.
Chapter 5 The Network Layer.
Oct 21, 2004CS573: Network Protocols and Standards1 IP: Addressing, ARP, Routing Network Protocols and Standards Autumn
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
Copyright © 2005 Department of Computer Science CPSC 641 Winter Network Traffic Measurement A focus of networking research for 20+ years Collect.
Network Layer4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side,
Network Layer4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side,
Network Measurement Bandwidth Analysis. Why measure bandwidth? Network congestion has increased tremendously. Network congestion has increased tremendously.
Guide to TCP/IP, Third Edition
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Internet Control Message Protocol (ICMP). Objective l IP and ICMP l Why need ICMP? l ICMP Message Format l ICMP fields l Examples: »Ping »Traceroute.
Packet Filtering Chapter 4. Learning Objectives Understand packets and packet filtering Understand approaches to packet filtering Set specific filtering.
Senior Project Ideas: Blind Communication & Internet Measurements Mehmet H. Gunes.
TCOM 515 IP Routing. Syllabus Objectives IP header IP addresses, classes and subnetting Routing tables Routing decisions Directly connected routes Static.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 A note on the use of these.
1 TCP/IP Internetting ä Subnet layer ä Links stations on same subnet ä Often IEEE LAN standards ä PPP for telephone connections ä TCP/IP specifies.
1 Internet Control Message Protocol (ICMP) Used to send error and control messages. It is a necessary part of the TCP/IP suite. It is above the IP module.
Network Layer4-1 The Internet Network layer forwarding table Host, router network layer functions: Routing protocols path selection RIP, OSPF, BGP IP protocol.
Lecture 14 Internet Measurements. 2 Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency.
Transport Layer3-1 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet.
Internet Measurements. 2 Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency The.
Lecture 14: Internet Measurement CS 765: Complex Networks.
Lecture 17 Internet Measurements. 2 Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency.
정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.
The Internet Network layer
Data Communications and Computer Networks Chapter 4 CS 3830 Lecture 19 Omar Meqdadi Department of Computer Science and Software Engineering University.
IP Protocol CSE TCP/IP Concepts Connectionless Operation Internetworking involves connectionless operation at the level of the Internet Protocol.
Internet Measurements. 2 Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency The.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Lecture 2: Internet Measurement CS 790g: Complex Networks.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
Introduction to Networks
Homework 4 Out: Fri 2/24/2017 In: Fri 3/10/2017.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Internet Measurements
Computer Communication Networks
IP: Addressing, ARP, Routing
Chapter 5 Network and Transport Layers
Network Tools and Utilities
21-2 ICMP(Internet control message protocol)
ICMP ICMP – Internet Control Message Protocol
CS4470 Computer Networking Protocols
Introduction to Networking
RESOLVING IP ALIASES USING DISTRIBUTED SYSTEMS
Chapter 16: Distributed System Structures
Routing.
CS 457 – Lecture 10 Internetworking and IP
CPSC 641: Network Measurement
EEC-484/584 Computer Networks
Process-to-Process Delivery:
Pong: Diagnosing Spatio-Temporal Internet Congestion Properties
Wide Area Networks and Internet CT1403
CS 457 – Lecture 11 More IP Networking
Overview The Internet (IP) Protocol Datagram format IP fragmentation
Lecture 2: Overview of TCP/IP protocol
Networking Theory (part 2)
Net 323 D: Networks Protocols
Chapter 15. Internet Protocol
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Lecture 26: Internet Topology CS 765: Complex Networks.
CPSC 641: Network Measurement
Process-to-Process Delivery: UDP, TCP
ITIS 6167/8167: Network and Information Security
Routing.
32 bit destination IP address
Presentation transcript:

Internet Measurements CPE 401/601

Internet Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency The building blocks are engineered and studied in depth Global entity has not been characterized Most real world complex-networks have non-trivial properties. Global properties can not be inferred from local ones Engineered with large technical diversity Range from local campuses to transcontinental backbone providers

Role of Internet Directories and Databases Address registries Domain Name System (DNS) Internet Address and Routing Registries Internet Assigned Numbers Authority (IANA) Internet Routing Registry Clearinghouse for AS number mapping Regional Internet Registries (RIR)

Role of Internet Directories and Databases

Internet Measurements Need for Internet measurements arises due to commercial, social, and technical issues Realistic simulation environment for developed products Improve network management Robustness with respect to failures/attacks Comprehend spreading of worms/viruses Know social trends in Internet use Scientific discovery Scale-free (power-law), Small-world, Rich-club, Dissasortativity,… Topology collection is hard as ISPs do not share their internal topology info by default Complex networks are being analyzed for their growth mechanism and topological characteristics. Scale-free (Power-law) Small-world (6 degree of separation) Dissasortative mixing (degree-degree correlation) Rich-club phenomenon (tightly interconnected core)

Challenges to measurement “Poor Observability” Reasons for this: Core simplicity Layered architecture Hidden pieces Administrative barriers

Internet Measurements are anything but straightforward… Internet Measurement is key to designing the next generation communication network Fundamental design principles of the current internet make it harder for measuring various aspects of it Preliminary research has resulted in a set of basic tools and methods to measure aspects like topology, traffic etc. There is still a lot of ground to cover in this direction

Where Can Measurements Be Made? IXP

Measurement Types

Topology Measurements

Properties to Measure Topology Properties Autonomous System (AS) Point of Presence (PoP) Router Interface

Longitudinal comparison Sources: 1971 - "Casting the Net", page 64; 1980 - http://mappa.mundi.net/maps/maps_001/ http://personalpages.manchester.ac.uk/staff/m.dodge/cybergeography/atlas/historical.html

Internet Topology CAIDA 2006

Internet Topology Measurement CAIDA 2006

Internet Topology Measurement CAIDA 2006

IPv4 address space (2010) ~ 3.5 B IPs ~ 250 M replies Ant Census Data researchers have been collecting data about the Internet address space  ~ 3.5 B IPs ~ 250 M replies browse historical

Active Measurement Tools Methods that involve adding traffic to the network for the purposes of measurement Ping: Sends ICMP ECHO_REQUEST and captures ECHO_REPLY Useful for measuring RTTs Only sender needs to be under experiment control One-Way Active Measurement Protocol (OWAMP): A daemon running on the target which listens for and records probe packets sent by the sender Useful for measuring one-way delay Requires both sender and receiver to be under experiment control Requires synchronized clocks or a method to remove clock offset

Probing Direct probing Indirect probing B C D A A D B C Vantage Point IPD Vantage Point IPD TTL=64 B C D A A D B C Vantage Point IPB IPC IPD TTL=1 IPD TTL=2 18

Traceroute Useful for determining path from a source to a destination Uses the TTL (Time To Live) field in the IP header in a clever but distorted way Large scale measurement systems use traceroute to discover network topology

Traceroute Probe packets are carefully constructed to elicit intended response from a probe destination traceroute probes all nodes on a path towards a given destination TTL-scoped probes obtain ICMP error messages from routers on the path ICMP messages includes the IP address of intermediate routers as its source Merging end-to-end path traces yields the network map IPA IPB IPC IPD Vantage Point Destination TTL=4 TTL=3 TTL=1 TTL=2 A B C D S 20

IP Header and the TTL field ver length 32 bits data (variable length, typically a TCP or UDP segment) 16-bit identifier Internet checksum time to live 32 bit source IP address IP protocol version number header length (bytes) max number remaining hops (decremented at each router) for fragmentation/ reassembly total datagram length (bytes) upper layer protocol to deliver payload to head. len type of service “type” of data flgs fragment offset upper layer 32 bit destination IP address Options (if any) E.g. timestamp, record route taken, specify list of routers to visit.

TTL normal usage TTL is initialized by the sender and decremented by one each time the packet passes through a router If it reaches zero before reaching the destination, IP protocol requires that the packet be discarded and an error message be sent back to the sender Error message is an ICMP “time exceeded” packet

Traceroute Problem Suppose the path between A and D is to be determined using traceroute X Y D A B C

Traceroute Process X Y D A B: “time exceeded” Dest = D TTL = 1 B C

Traceroute Process X Y D A C: “time exceeded” Dest = D TTL = 2 B C

Traceroute Process X Y D A D: “echo reply” Dest = D TTL = 3 B C

Internet Topology Measurement Internet2 backbone S s.3 s.2 s.2 n.1 N n.3 n.3 c.2 u.1 w.2 c.1 w.1 W C c.3 u.2 w.3 w.3 U k.1 c.4 k.2 u.3 K l.1 k.3 a.1 Trace to NY L l.2 a.2 A (k,m)- traceroute l.3 l.3 a.3 a.3 Trace to Seattle h.2 H h.1 h.3 h.4 h.4 h.4 d

Internet Topology Measurement f S s.3 n.2 s.2 n.1 N n.3 c.2 u.1 c.1 w.1 w.2 W C c.3 u.2 w.3 U k.1 c.4 k.2 u.3 K l.1 k.3 a.1 L l.2 a.2 A (n,n)- traceroute l.3 a.3 Traces d - H - L - S - e d - H - A - W - N - f e - S - L - H - d e - S - U - K - C - N - f f - N - C - K- H - d f - N - C - K - U - S - e h.2 H h.1 h.3 h.4 d

Challenges Infrastructural Issues Sampling Probing Overhead Vantage Points and Destination List Probing Overhead Inter- and Intra-monitor Redundancy Responsiveness of Routers ICMP, UDP, TCP Load Balancing Routers Per destination, per flow, per packet

Traceroute issues Path Asymmetry Unstable Paths and False Edges Destination -> Source need not retrace Source -> Destination Unstable Paths and False Edges Aliases Measurement Load

Unstable Paths and False Edges Inferred path: A -> B -> Y Y: “time exceeded” Dest = D TTL = 2 X Y D A B: “time exceeded” Dest = D TTL = 1 B C

Topology Sampling: Issues Sampling to discover networks Infer characteristics of the topology Different studies considered Effect of sample size Sampling bias Path accuracy Sampling approach Utilized protocol ICMP echo request TCP syn UDP port unreachable ~ 10% of routers are unresponsive Protocol Responsiveness ICMP 81.9 % TCP 67.3 % UDP 59.9 %

Measurement Load Traceroute inserts considerable load on network links if attempting a large-scale topology discovery Optimizations reduce this load considerably If single source is used, instead of going from source to destination, a better approach is to retrace from destination to source If multiple sources and multiple destinations are used, sharing information among these would bring down load considerably

Intra-monitor redundancy Destination 2 Destination 1 Destination 3 Monitor 1

Inter-monitor redundancy Destination 1 Monitor 2 Monitor 1 Monitor 3

Unresponsive Routers Unresponsive routers do not respond to traceroute probes and appear as  in traceroute output Same router may appear as  in multiple traces. S L H y x y y S 1 2 H x y: S – L – H – x y: S –  – H – x S L H x: H – L – S – y x: H –  – S – y x

Unresponsive Router Resolution Internet2 backbone f e S N C W U K L A H Traces d -  - L - S - e d -  - A - W -  - f e - S - L -  - d e - S - U -  - C -  - f f -  - C -  -  - d f -  - C -  - U - S - e d

Common Structures due to ARs x C y2 Parallel -substring y1 y3  A C x y D w F v E z Complete Bipartite  A C x y D w E z Clique  D A w x C y E z Star 

IP Alias Resolution .33 Each interface of a router has an IP address. A router may respond with different IP addresses to different queries. Alias Resolution is the process of grouping the interface IP addresses of each router into a single node. Inaccuracies in alias resolution may result in a network map that includes artificial links/nodes misses existing links .5 .18 Denver .7 .13

IP Alias Resolution Traces d - h.4 - l.3 - s.2 - e f S s.3 n.2 s.2 n.1 N n.3 c.2 u.1 c.1 w.1 w.2 C W u.2 c.3 U k.1 w.3 c.4 k.2 u.3 K k.3 l.1 l.2 a.1 L a.2 A l.3 a.3 h.2 Traces d - h.4 - l.3 - s.2 - e d - h.4 - a.3 - w.3 - n.3 - f e - s.1 - l.1 - h.1 - d e - s.1 - u.1 - k.1 - c.1 - n.1 - f f - n.2 - c.2 - k.2 - h.2 - d f - n.2 - c.2 - k.2 - u.2 - s.3 - e H h.3 h.1 h.4 d

IP Alias Resolution Approaches Source IP Address Based Method Relies on a particular implementation of ICMP error generation. IP Identification Based Method (ally) Relies on a particular implementation of IP identifier field, Many routers ignore direct probes. DNS Based Method Relies on similarities in the host name structures sl-bb21-lon-14-0.sprintlink.net sl-bb21-lon-8-0.sprintlink.net Works when a systematic naming is used. Record Route Based Method Depends on router support to IP route record processing A B B Dest = A A B A, ID=100 Dest = A Dest = B B, ID=103 B, ID=99 Dest = B

Subnet Inference Subnet resolution Identify IP addresses that are connected over the same medium Improve the quality of resulting topology map IP2 IP3 IP1 C D A B C D A B C D A B C D A B (underlying topology) (observed topology) (inferred topology)

Subnet Inference Approach 129.110.0.0/16 129.110.1.1 129.110.1.2 129.110.2.0 129.110.2.1 129.110.4.1 129.110.4.83 129.110.4.217 129.110.12.1 129.110.12.2 129.110.12.6 129.110.17.1 129.110.17.135 129.110.219.1 2 3 4 1 5 V.P. 129.110.1.0/31 129.110.219.0/24 /24 129.110.2.0/30 129.110.4.0/24 /24 129.110.12.0/29 129.110.4.0/24 /30 129.110.1.0/30 /29 129.110.2.0/31 /31 129.110.12.0/29 129.110.6.0/28 129.110.17.0/24 129.110.17.0/24 /28 /24 Subnet-level Internet mapping : Subnet Inference

Analytical IP Alias Resolution no response UTD 129.110.95.1 no response 129.110.5.1 206.223.141.74 206.223.141.73 206.223.141.69 Aliases 129.110.5.1 - 206.223.141.74 206.223.141.73 - 206.223.141.69 206.223.141.70 - 198.32.8.33 … 206.223.141.70 198.32.8.33 198.32.8.34 198.32.8.65 198.32.8.66 198.32.8.85 198.32.8.84 192.5.89.10 192.5.89.9 192.5.89.89 192.5.89.90 18.168.0.27 18.168.0.25 18.7.21.1 18.7.21.84 MIT

Sample AS backbones

Geolocation Given the network address of a target host, what is the host’s geographic location? The answer to this is useful for a wide variety of social, economic and engineering purposes The actual location of network infrastructure sheds light on how it relates to population, social organization and economic activity

Geolocation methods Name Based Geolocation Location Databases Extracting location details from ISPs domain names Location Databases Delay Based Geolocation Best Landmark Constraint-based

Landmark based geolocation In best landmark approach, minRTT between each of the identified landmarks is measured and stored Then the same metric is calculated between the node in question and each of the landmarks. The landmark with the best matching values of minRTT is the closest to the node

Constraint based geolocation The distances of target location from sufficient number of fixed points are calculated and using multilateration Used in GPS However, Internet delay is affected by many factors (i.e., non-linear)

Passive Measurements Methods that capture traffic generated by other users and applications Routeview repository collects BGP views (routing tables) from a large set of ASes Similarly, OSPF LSAs can be captured and processed to generate router graphs within an AS

Passive Measurement: Advantages and Disadvantages Large set of AS-AS, router-router connections can be learned by simply processing captured tables However, especially using BGP views, there could be potential loss of cross-connections between ASes which are along the path Secondly, route aggregation and filtering tends to hide some connections Also, multiple connections between ASes will be shown as a single connection in the graph

AS level Internet topology Connection between ASes AS needs to know how to reach the rest of the Internet BGP (Border Gateway Protocol) provides reachability across the whole Internet exchange routing information between ASes iBGP, eBGP eBGP: Border router a direct link to another border router in another AS AS 1 AS 2

AS rank Data: http://as-rank.caida.org/

Historical

Traffic Measurements

Traffic Measurements Monitoring and measuring network traffic to produce better models of network behavior to diagnose failures and detect anomalies to defend against unwanted traffic to simulate and plan new networks/protocols Live maps https://livemap.pingdom.com/ http://atlas.grnoc.iu.edu/atlas.cgi?map_name=Internet2%20IP%20Layer http://map.norsecorp.com/ https://threatmap.checkpoint.com/ThreatPortal/livemap.html RIPE atlas Internet Mapper

Measurement Tools (1 of 3) Can be classified into hardware and software measurement tools Hardware: specialized equipment Examples: LAN Analyzer, DataGeneral Network Sniffer, others... Software: special software tools Examples: tcpdump, xtr, SNMP, others...

Measurement Tools (2 of 3) Measurement tools can also be classified as active or passive Active: the monitoring tool generates traffic of its own during data collection (e.g., ping, pchar) Passive: the monitoring tool is passive, observing and recording traffic info, while generating none of its own (e.g., tcpdump)

Measurement Tools (3 of 3) Measurement tools can also be classified as real-time or non-real-time Real-time: collects traffic data as it happens, and may even be able to display traffic info as it happens, for real-time traffic management Non-real-time: collected traffic data may only be a subset (sample) of the total traffic, and is analyzed off-line (later), for detailed analysis

Potential Uses of Tools (1 of 4) Protocol debugging Network debugging and troubleshooting Changing network configuration Designing, testing new protocols Designing, testing new applications Detecting network weirdness: broadcast storms, routing loops, etc.

Potential Uses of Tools (2 of 4) Performance evaluation of protocols and applications How protocol/application is being used How well it works How to design it better

Potential Uses of Tools (3 of 4) Workload characterization What traffic is generated Packet size distribution Packet arrival process Burstiness Important in the design of networks, applications, interconnection devices, congestion control algorithms, etc.

Potential Uses of Tools (4 of 4) Workload modeling Construct synthetic workload models that concisely capture the salient characteristics of actual network traffic Use as representative, reproducible, flexible, controllable workload models for simulations, capacity planning studies, etc.

Traffic Measurement Time Scales Performance analysis representative models throughput, packet loss, packet delay Microseconds to minutes Network engineering network configuration capacity planning demand forecasting traffic engineering Minutes to years Different measurement methods

Properties Most basic view of traffic is as a collection of packets passing through routers and links Packets and Bytes One can capture/observe packets at some location Packet arrivals interarrivals count traffic at timescale T Captures workload generated by traffic on a per-packet basis Packet Size time series of Byte count Captures the amount of consumed bandwidth packet size distribution router design etc.

Higher-level Structure Transport protocols and applications ON/OFF process bursty workload Packet-level Packet Train interarrival threshold Session single execution of an application Human generated

Flows Set of packets passing an observation point during a time interval with all packets having a set of common properties Header field contents, packet characteristics, etc. IP flows source/destination addresses IP or transport header fields prefix Network-defined flow network’s workload ingress and egress Traffic matrix and Path matrix

Semantically Distinct Traffic Types Control Traffic Control plane Routing protocols BGP, OSPF, IS-IS Measurement and management SNMP General control packets ICMP Data plane Malicious Traffic

Challenges Practical issues Observability Data volume Data sharing Core simplicity Flows Packets Distributed Internetworking IP Hourglass Data volume Data sharing

Challenges Statistical difficulties Long tails and High variability Instability of metrics Modeling difficulty Confounding intuition Stationarity and stability Stationarity: joint probability distribution does not change when shifted in time Stability: consistency of properties over time Autocorrelation and memory in system behavior High dimensionality

Tools Packet Capture General purpose systems Special purpose system libpcap tcpdump ethereal scriptroute … Special purpose system Control plane traffic GNU Zebra Routeviews

Data Management Full packet capture and storage is challenging Limitations of commodity PC Data stream management Big Data platforms Hadoop, etc.

Data Reduction Lossy compression Counters Flow capture SNMP Management Information Base Flow capture Packet trains Packet flows

Data Reduction Sampling Basic packet sampling Trajectory sampling Random: with fixed probability Deterministic: periodic samples Stratified: multi step sampling Trajectory sampling Chose a randomly sampled packet at all locations

Data Reduction Summarization Bloom filters Sketches: Dimension reducing random projections Probabilistic counting Landmark/sliding window models

Overview of Traffic Analysis

Cisco US traffic estimate

Cisco Monthly Traffic Volume

Traffic flow monitoring

Traffic Samples from Internet2

DNS Workload for the Chilean TLD the number of queries and clients seen by one of the .CL nameservers in Chile its hourly variation along two days of traces on July 3rd 2007 animation

Code-Red Worm On July 19, 2001, more than 359,000 computers connected to the Internet were infected with the Code-Red (CRv2) worm in less than 14 hours Spread

Sapphire Worm was the fastest computer worm in history doubled in size every 8.5 seconds infected more than 90 percent of vulnerable hosts within 10 minutes.

Witty Worm reached its peak activity after approximately 45 minutes at which point the majority of vulnerable hosts had been infected World USA

Nyxem Email Virus Estimate of total number of infected computers is between 470K and 945K At least 45K of the infected computers were also compromised by other forms of spyware or botware Spread

Scam Hosting Study dynamics of scam hosting infrastructure

Sipscan scan a botnet-orchestrated stealth scan of the entire IPv4 address space during 31 Jan - 12 Feb 2011 originated from ~3 million IP addresses heavily coordinated unusually covert scanning strategy to discover and compromise VoIP-related (SIP server) infrastructure http://www.youtube.com/watch?v=n6MRlEJeD8M

Anonymizer Usage Anonymity network usage analysis 205 million packets about 1.44TB data Analyzed Anonymity Networks Network Servers Service Tor 61,798 General I2P 2,267 P2P JAP 11 Remailers 15 Email Proxies 7,246 Commercial Anomymizer,Gotrusted You can do it without using this system, but process hundreds GB to several TB data will take you long time. But in our system, the data process takes several minutes. one month from one of many switch. You may have to write complicated programing to get the result, but using Pig is native and just hundreds line of scripts.

Tor usage

WWW Web is the single most popular Internet application. Measurement can be very useful.

Bow-tie of the WWW

Stanford versus MIT Web Users with non-empty WWW directories 7473 2302 Percent who link to at least one other person 14% 33% Percent who are linked to by at least one other person 22% 58% Percent with links in either direction 29% 69% Percent with links in both directions 7% MIT  Stanford 

Challenges to Web measurement Hidden Data Much of the traffic is intra-net and inaccessible. Access to remote server data, even old logs is often unavailable. From the server end, information about the clients (e.g. connection bandwidth) is obscured. Hidden layers Measuring the in flight packets is much harder than measuring the server response time the protocol and network layers are harder to measure. Hidden entities The web involves proxies, HTTP and TCP redirectors

Web Properties: High level Netcraft survey. (news.netcraft.com)

Web Properties: High Level Netcraft survey. (news.netcraft.com)

Measurement Studies Autonomous System Mapper iPlane Glasnost BW-meter map Internet backbone iPlane construct an atlas of the Internet, measuring link attributes Glasnost tests whether BitTorrent is being blocked or throttled BW-meter Measurement tools for the capacity and load of Internet paths NPAD Diagnostics Servers Automatic diagnostic server for troubleshooting end-systems and last-mile network problems Hubble find persistent Internet black holes as they occur

Measurement Studies Japanese ISP traffic DNS workload http://www.caida.org/tools/visualization/cuttlefish/pics/japan-traces.gif DNS workload http://www.caida.org/research/dns/cl/animated_maps/images/cl-worldmap.animated.queries.gif Egypt Internet Blackout http://www.caida.org/publications/papers/2013/coordinated_view_internet_events/supplemental/egypt.composite.mp4 http://youtu.be/4Khc0XgvdbM http://youtu.be/YWXgWfNxR9Q

Observation #1 The traffic model that you use is extremely important in the performance evaluation of routing, flow control, and congestion control strategies Have to consider application-dependent, protocol-dependent, and network-dependent characteristics The more realistic, the better

Observation #2 Characterizing aggregate network traffic is hard Lots of (diverse) applications Just a snapshot: traffic mix, protocols, applications, network configuration, technology, and users change with time

Observation #3 Packet arrival process is not Poisson Packets travel in trains Packets travel in tandems Packets get clumped together (ack compression) Interarrival times are not exponential Interarrival times are not independent

Observation #4 Packet traffic is bursty Average utilization may be very low Peak utilization can be very high Depends on what interval you use!! Traffic may be self-similar bursts exist across a wide range of time scales Defining burstiness (precisely) is difficult

Observation #5 Traffic is non-uniformly distributed amongst the hosts on the network Example: 10% of the hosts account for 90% of the traffic (or 20-80) Why? Clients versus servers, geographic reasons, popular ftp sites, web sites, etc.

Observation #6 Network traffic exhibits ‘‘locality’’ effects Pattern is far from random Temporal locality Spatial locality Persistence and concentration True at host level, at gateway level, at application level

Observation #7 Well over 90% of the byte and packet traffic on most networks is TCP/IP By far the most prevalent Often as high as 95-99% Most studies focus only on TCP/IP for this reason

Observation #8 Most conversations are short Example: 90% of bulk data transfers send less than 10 kilobytes of data Example: 50% of interactive connections last less than 90 seconds Distributions may be ‘‘heavy tailed’’ i.e., extreme values may skew the mean and/or the distribution

Observation #9 Traffic is bidirectional Data usually flows both ways Not just acks in the reverse direction Usually asymmetric bandwidth though Pretty much what you would expect from the TCP/IP traffic for most applications

Observation #10 Packet size distribution is bimodal Lots of small packets for interactive traffic and acknowledgements Lots of large packets for bulk data file transfer type applications Very few in between sizes

Internet Measurements The Internet is man-made, so why do we need to measure it? Because we still don’t really understand it Sometimes things go wrong Malicious users Measurement for network operations Detecting and diagnosing problems What-if analysis of future changes Measurement for scientific discovery Creating accurate models that represent reality Identifying new features and phenomena

How CDN Works Define the term for replica There are large and popular Web servers, such as CNN.com and msnbc.com, which need to improve performance and scalability. Let’s imagine that there is a Web server in Cambridge, London. And one client in W&M try to access some content. Later on, a nearby client from U of Virginia try to access similar content, then directly served from the CDN server. (UGA, and GIT), U Tennessee So, the CDN reduces the latency for the client, reduces the b/w for Web servers, and improve the scalability and availability for the Web content server. And it helps the Internet as a whole by reducing the long-haul traffic.