Measuring the Internet: A case study by Bob Mandeville and Andrew Corlett
Agenda PART 1 IP Performance Measurement Case Study (What we did) PART 2 Measurement System and Methodology (How we did it) PART 3 Drawing Conclusions (and going onto next steps)
PART 1 IP Performance Measurement Case Study (What we did)
The setting Large-scale test of seven of the world’s biggest ISPs 28 measurement nodes (cNodes) on backbone core of: –Cable & Wireless (C&W) –Level 3 Communications –Qwest Communications –Savvis Communications –Sprint Corp. –Verio –Williams Communications
Measurement packet generation Test ran 30 days; total project took more than a year to complete cNodes generated 4,558,388,076 packets during the month of August 2002 All told, we collected 156,050,656 discrete measurements cNodes record more than 70 IP metrics but in this test we focused on just three: uptime, jitter, and packet loss
Packet types The cNodes generated vectors of both 1,518 byte TCP and 256 byte UDP packets With each cNode sending packets to three other cities there were a total of six vectors per cNode cNodes configured to generate an aggregate transmit rate for all vectors not to exceed 512 kbit/s.
A picture tells…
Measured Uptime
Maintenance Windows
Outages by Numbers
Measured Jitter
Measured Packet Loss
PART 2 Measurement System and Methodology (How we did it)
System Architecture Traffic Engineering Application #2 Application #3 Service-Daemon Database cNode OSS BROWSER
Service-Daemon Central hub of Measurement System Configures cNodes for measurements Retrieves Results and stores into database Sophisticated state-machines maintain measurement system automatically. For example: Downloads results stored in cNodes but not stored in database Configures cNodes that may have been power-cycled. cNodes continue to measure and store results internally if connectivity to Service-Daemon is interrupted CLI/Scripting engine allows for external and bulk configuration Runs on Windows, Solaris, and Linux
Terminology Vector Basis of all measurements Defines measurements from one cNode to another cNode All packets are formatted the same (Service-Type) Many different vectors can be executed simultaneously HTTP, VoIP, FTP, etc. Service-Type/Packet-Types Defines the format of measurement packets Example: TELNET, TCP Port 23, 1500 byte packets
Terminology Vector Handler Computes and stores measurement results Located on the destination cNode Measurement Period Interval of time representing results data 5 minute intervals Can be combined to report or alarm on larger intervals 10, 15, 30, 1hr, 1day
Service-Type/Packet-Type Optional UDP or TCP headers port numbers TCP fields: Flags, Window, MSS option, Urgent Pointer DSCP settings Packet Length Payload Type (all 0’s, all 1’s or Random) TCP- Flags, Window Size, Urgent Pointer, MSS option TTL Loose, Strict, and/or Record Route options VLAN tags
Continuous Measurements cNode Measurement Period (5 minutes) 12:0012:2012:1512:1012:0512:25 Database Computed Results
Results Every 5 minutes all of the packets received for a vector are processed through sophisticated algorithms and a ~1Kbyte results packet is created representing all of the metrics The results packet is automatically sent to the service- daemon and stored into the internal memory of the cNode Results packets can be combined so reports and alarms can be generated over time periods other than 5 minute intervals: e.g. 1 hour, 1 day, 1 week or even 1 year.
Measurement Packet Optional UDP or TCP headers Source/Destination Port numbers TCP fields: Flags, Window, MSS option, Urgent Pointer DSCP settings Packet Length Payload Type (all 0’s, all 1’s or Random) TCP- Flags, Window Size, Urgent Pointer, MSS option TTL Loose, Strict, and/or Record Route options VLAN tags Ethernet Header IP Header Optional IP Header Optional Header (UDP/TCP) Payload (zeros/ones/random) Metric Header Ethernet CRC Timestamp
Metric Header Allows measurement packets to be formed as any protocol without interfering with manageability of cNodes E.g. cNodes can measure Telnet traffic while Telnet sessions are in process on the cNode Header Identifier and Version Hardware Timestamp UTC, 64-bit, 1ns units Packet ID 64-bit Initial TTL, TOS, and IP Protocol fields Payload Checksum Metric Header Checksum Vector and Measurement Period Identification
One-Way Measurements Accurate 64-bit hardware timestamps 12.5 ns clock synchronized by GPS (internal), 1 PPS and IRIG-B, and/or NTP All counters are 64-/128-/256-bit Continuous Send active measurements continuously Calculate results every 5 minutes Comprehensive Over 65 IP Metrics Delay (latency), jitter, loss, outages Out-of-order, loss patterns, fragmentation, hop count and hop changes, DSCP changes, duplicates, corruptions
One-Way Measurements Scalable Highly distributed system Results computed at cNodes cProt allows minimal communication w/cNodes for configuration and data gathering Operationally: system designed to be self-maintaining Scientific Methodology designed from years of test and measurement experience Statistical accuracy – Pullin papers (CalTech) Accountable Event-lists account for power-failures, link failures, time synchronization changes, etc. Comparable Over time and topology
IP Packet Metrics
PART 3 Drawing Conclusions (and going to next steps)
Some conclusions drawn from the experience… We disagree with the NetworkWorld article conclusion: outages were too significant to qualify providers as ‘telco grade’ One-way measurements hampered by lack of GPS clock sources on 85% of sites under test Full set of 70 IP metrics used successfully to analyze anomalous behavior Currently majority of ISPs do not have advanced IP measurement capabilities deployed on their networks
Is there any time left for questions?