PingER: Methodology, Uses & Results Les Cottrell SLAC, Warren Matthews GATech Extending the Reach of Advanced Networking: Special International Workshop Arlington, VA., April 22, 2004 www.slac.stanford.edu/grp/scs/net/talk03/i2-method-apr04.ppt This presentation will introduce the methodology used by the PingER project to measure end-to-end Internet performance. We will then illustrate the use of PingER to show overall Internet performance trends and differences to most regions of the world for the last 9 years. This will be followed up with some specific illustrations of how PingER has been used to help policy making decisions and indicate the results of those decisions. We will conclude with some of the challenges and the overall state of Internet end-to-end performance across the Digital Divide. Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP
Outline What is PingER World Internet performance trends Regions and Digital Divide Examples of use Challenges Summary of Uses
Methodology Use ubiquitous ping Each 30 minutes from monitoring site to target : 1 ping to prime caches by default send11x100Byte pkts followed by 10x1000Byte pkts Low network impact + no software to install / configure / maintain at remote sites + no passwords / accounts needed = good for developing sites / regions Record loss & RTT, (+ reorders, duplicates) Derive throughput, jitter, unreachability …
Architecture Hierarchical vs. full mesh WWW Archive Archive Monitoring HTTP SLAC Ping Reports & Data Archive FNAL Archive Monitoring ~35 Monitoring Monitoring Cache Monitoring Remote 1 monitor host remote host pair Remote Remote Remote ~550 Hierarchical vs. full mesh
Regions Monitored Recent added NIIT PK as monitoring site Monitoring sites in ~ 35 countries Recent added NIIT PK as monitoring site White = no host monitored in country Colors indicate regions Also have affinity groups (VOs), e.g. AMPATH, Silk Road, CMS, XIWT and can select multiple groups Worksheet: \\Zwinsan2\c\cottrell\regions-mapland.xls
World Trends Increase in sites with Good (<1%) loss 25% increase in sites monitored Big focus on Africa 4=>19 countries Silk Road Spreadsheet\cottrell\iepm\world-loss-quality.xls
Trends S.E. Europe, Russia: catching up Latin Am., Mid East, China: keeping up India, Africa: falling behind Derived throughput~MSS/(RTT*sqrt(loss)) Silk Road NaukaNet/ Gloriad Spreadsheet \cottrell\iepm\esnet-to-all-longterm.xls CERN data only goes back to Aug-01. It confirms S.E. Europe & Russia are catching up, and India & Africa are falling behind Note for Africa only one host in Uganda. Actually have been adding hosts 5 countries), but there is considerable disparity in performance so as add hosts from less developed countries the aggregate performance measured to Africa is dropping! Ghana, Nigeria and Uganda are all satellite links with 800-1100ms RTTs. The losses to Ghana & Nigeria are 8-12% while to Uganda they are 1-3%. The routes are different. The route from SLAC to Ghana uses ESnet-Worldcom-UUNET, Nigeria goes CalREN-Qwest-Teiianet-New Skies satellite, Uganda goes Esnet-Level3-Intelsat. For both Ghana and Nigeria there are no losses (for 100 pings) until the last hop when over 40 of 100 packets were lost. For Uganda the losses (3 in 100 packets) also occur at the last hop. Worksheet: for trends: \\Zwinsan2\c\cottrell\iepm\esnet-to-all-longterm.xls for Africa: \\Zwinsan2\c\cottrell\iepm\africa.xls 10cottrell@flora02:~>ntrace www.ug.edu.gh traceroute to www.ug.edu.gh (213.237.174.3): 1-30 hops, 38 byte packets 1 rtr-core1-nethub.slac.stanford.edu (134.79.19.2) [AS3671 - SU-SLAC] 0.563 ms 0.344 ms 0.312 ms 2 rtr-dmz1-ger.slac.stanford.edu (134.79.135.15) [AS3671 - SU-SLAC] 0.365 ms 0.364 ms 0.344 ms 3 192.68.191.146 (192.68.191.146) 0.405 ms (ttl=252!) 0.406 ms (ttl=252!) 0.386 ms (ttl=252!) 4 snv-pos-slac.es.net (134.55.209.1) [AS293 - Energy Sciences Network (ESnet)] 0.774 ms (ttl=251!) 0.863 ms (ttl=251!) 0.751 ms (ttl=251!) 5 snvrt1-ge0-snvcr1.es.net (134.55.209.90) [AS293 - Energy Sciences Network (ESnet)] 0.823 ms (ttl=250!) 0.798 ms (ttl=250!) 0.820ms (ttl=250!) 6 188.ATM1-0.BR2.SJC1.ALTER.NET (204.255.174.49) [AS701 - UUNET, An MCI Worldcom Company] 2.37 ms 1.47 ms 1.39 ms 7 154.ATM3-0.XR1.SJC1.ALTER.NET (152.63.51.174) [AS701 - UUNET, An MCI Worldcom Company] 1.72 ms 2.23 ms 1.59 ms 8 0.so-0-0-0.XL1.SJC1.ALTER.NET (152.63.55.114) [AS701 - UUNET, An MCI Worldcom Company] 1.81 ms (ttl=247!) 1.77 ms (ttl=247!) 1.75 ms (ttl=247!) 9 0.so-3-0-0.TL1.SAC1.ALTER.NET (152.63.53.250) [AS701 - UUNET, An MCI Worldcom Company] 5.12 ms (ttl=246!) 5.98 ms (ttl=246!) 5.15 ms (ttl=246!) 10 0.so-7-0-0.IL1.NYC9.ALTER.NET (152.63.9.245) [AS701 - UUNET, An MCI Worldcom Company] 74.2 ms (ttl=242!) 74.8 ms (ttl=242!) 74.4ms (ttl=242!) 11 0.so-1-0-0.IR1.NYC12.ALTER.NET (152.63.23.62) [AS701 - UUNET, An MCI Worldcom Company] 75.3 ms (ttl=241!) 75.2 ms (ttl=241!) 73.7 ms (ttl=241!) 12 so-6-0-0.TR1.CPH3.ALTER.NET (146.188.7.193) [AS702 - UUNET, An MCI Worldcom Company] 171 ms (ttl=241!) 171 ms (ttl=241!) 171 ms (ttl=241!) 13 POS5-0.XR1.CPH3.ALTER.NET (146.188.2.214) [AS702 - UUNET, An MCI Worldcom Company] 172 ms (ttl=241!) 173 ms (ttl=241!) 171 ms (ttl=241!) 14 POS4-0-0.CR1.CPH2.ALTER.NET (146.188.3.141) [AS702 - UUNET, An MCI Worldcom Company] 171 ms (ttl=240!) 172 ms (ttl=240!) 172 ms (ttl=240!) 15 FastEthernet10-0-0.GW1.CPH2.ALTER.NET (146.188.30.115) [AS702 - UUNET, An MCI Worldcom Company] 212 ms (ttl=239!) 347 ms (ttl=239!) 408 ms (ttl=239!) 16 satworks.gw.dk.uu.net (213.237.172.50) [AS702 - UUNET DK Block 4] 226 ms (ttl=238!) 163 ms (ttl=238!) 163 ms (ttl=238!) 17 213.237.174.3 (213.237.174.3) [AS702 - UUNET DK Block 4] * 907 ms 865 ms# 43% loss on 100 pings (0 losses until this hop) 9cottrell@flora02:~>ntrace asoju.oauife.edu.ng traceroute to asoju.oauife.edu.ng (63.100.199.60): 1-30 hops, 38 byte packets 1 rtr-core1-nethub.slac.stanford.edu (134.79.19.2) [AS3671 - SU-SLAC] 0.541 ms 0.347 ms 0.319 ms 2 rtr-dmz1-ger.slac.stanford.edu (134.79.135.15) [AS3671 - SU-SLAC] 0.363 ms 0.357 ms 0.343 ms 3 i2-gateway.stanford.edu (192.68.191.83) 0.337 ms 0.337 ms 0.313 ms 4 STAN.POS.calren2.NET (171.64.1.213) [AS32 - BN-CIDR-171.64] 0.431 ms 0.403 ms 0.363 ms 5 SUNV--STAN.POS.calren2.net (198.32.249.73) [AS11423 - NET-C2-NORTH] 0.804 ms 0.820 ms 0.826 ms 6 QSV-M10-2-C2.GE.calren2.net (137.164.12.167) [AS2150 - CENIC-DCP] 0.907 ms (ttl=249!) 1.07 ms (ttl=249!) 0.923 ms (ttl=249!) 7 65.113.32.209 (65.113.32.209) [AS209 - Qwest Communications] 0.893 ms (ttl=247!) 0.907 ms (ttl=247!) 0.845 ms (ttl=247!) 8 205.171.14.97 (205.171.14.97) [AS209 - Qwest Communications] 0.870 ms 0.892 ms 0.932 ms 9 205.171.205.30 (205.171.205.30) [AS209 - Qwest Communications] 1.11 ms (ttl=248!) 1.26 ms (ttl=248!) 1.19 ms (ttl=248!) 10 205.171.1.66 (205.171.1.66) [AS209 - Qwest Communications] 2.31 ms (ttl=245!) 2.31 ms (ttl=245!) 1.94 ms (ttl=245!) 11 sca-bb1-pos0-0-0.telia.net (213.248.86.57) [AS1299 - TELIANET-BLK] 2.93 ms (ttl=243!) 2.65 ms (ttl=243!) 3.02 ms (ttl=243!) 12 chi-bb1-pos1-0-0.telia.net (213.248.80.33) [AS1299 - TELIANET-BLK] 47.8 ms (ttl=242!) 47.8 ms (ttl=242!) 48.1 ms (ttl=242!) 13 nyk-bb1-pos0-1-0.telia.net (213.248.80.5) [AS1299 - TELIANET-BLK] 74.6 ms (ttl=238!) 76.1 ms (ttl=238!) 75.0 ms (ttl=238!) 14 nyk-bb2-pos1-0-0.telia.net (213.248.80.14) [AS1299 - TELIANET-BLK] 93.3 ms (ttl=239!) 75.1 ms (ttl=239!) 75.3 ms (ttl=239!) 15 ldn-bb2-pos1-3-0.telia.net (213.248.65.37) [AS1299 - TELIANET-BLK] 148 ms (ttl=237!) 148 ms (ttl=237!) 147 ms (ttl=237!) 16 ldn-b1-pos11-0.telia.net (213.248.74.14) [AS1299 - TELIANET-BLK] 147 ms (ttl=236!) 148 ms (ttl=237!) 147 ms (ttl=236!) 17 ldn-th-i1-srp1-0.telia.net (193.45.0.132) [AS3301 - TELIANET-BLK] 147 ms (ttl=234!) 147 ms (ttl=234!) 148 ms (ttl=234!) 18 new-skies-01427-ldn-th-i1.c.telia.net (213.248.75.150) [AS1299 - TELIANET-BLK] 141 ms (ttl=242!) 141 ms (ttl=242!) 141 ms (ttl=242!) 19 rtr-cor01-pos6-0-0.cha.newskies.net (80.247.128.58) [AS17175 - New Skies Satellites]142 ms (ttl=241!) 142 ms (ttl=241!) 142 ms (ttl=241!) 20 rtr-dvb01-gi0-0-60.cha.newskies.net (80.247.128.162) [AS17175 - New Skies Satellites] 142 ms (ttl=240!) 142 ms (ttl=240!) 143 ms (ttl=240!) 21 * * * 22 63-100-199-60.reverse.newskies.net (63.100.199.60) [AS701 - UUNET - AS 701] 1391 ms (ttl=238!) 970 ms (ttl=238!) 988 ms (ttl=238!)# 44% loss on 100 pings for this hop, 0 for others 8cottrell@flora02:~>ntrace mail2.starcom.co.ug traceroute to mail2.starcom.co.ug (217.113.72.21): 1-30 hops, 38 byte packets 1 rtr-core1-nethub.slac.stanford.edu (134.79.19.2) [AS3671 - SU-SLAC] 0.510 ms 0.353 ms 0.313 ms 2 rtr-dmz1-ger.slac.stanford.edu (134.79.135.15) [AS3671 - SU-SLAC] 0.369 ms 0.391 ms 0.350 ms 3 192.68.191.146 (192.68.191.146) 0.476 ms (ttl=252!) 0.415 ms (ttl=252!) 0.359 ms (ttl=252!) 4 snv-pos-slac.es.net (134.55.209.1) [AS293 - Energy Sciences Network (ESnet)] 0.762 ms (ttl=251!) 0.798 ms (ttl=251!) 0.728 ms (ttl=251!) 5 snvrt1-ge0-snvcr1.es.net (134.55.209.90) [AS293 - Energy Sciences Network (ESnet)] 0.771 ms (ttl=250!) 0.821 ms (ttl=250!) 0.788 ms (ttl=250!) 6 paix-pa-snv.es.net (134.55.208.205) [AS293 - Energy Sciences Network (ESnet)] 1.89 ms 1.93 ms 1.94 ms 7 gigabitethernet1-0-112.edge1.paix-sjo1.Level3.net (209.245.146.145) [AS3356 - no more prtraceroute whiners ! Just kidding - we love you Nik.] 2.16 ms 1.60 ms 1.90 ms 8 GigabitEthernet3-1.core1.SanJose1.Level3.net (209.244.3.249) [AS3356 - no more prtraceroute whiners ! Just kidding - we love you Nik.] 3.01 ms 1.99 ms 1.95 ms 9 ae0-55.mp1.SanJose1.Level3.net (64.159.2.129) [AS3356 - no more prtraceroute whiners ! Just kidding - we love you Nik.] 2.67 ms (ttl=246!) 2.78 ms (ttl=246!) 2.70 ms (ttl=246!) 10 64.159.3.254 (64.159.3.254) [AS3356 - no more prtraceroute whiners ! Just kidding - we love you Nik.] 79.2 ms (ttl=245!) 79.3 ms (ttl=245!) 79.2 ms (ttl=245!) 11 so-2-0-0.mp1.London2.Level3.net (212.187.128.137) [AS9057 - Level 3 RIPE block] 152 ms (ttl=244!) 152 ms (ttl=244!) 153 ms (ttl=244!) 12 so-2-0-0.mp1.London1.Level3.net (212.187.128.50) [AS9057 - Level 3 RIPE block] 152 ms (ttl=243!) 152 ms (ttl=243!) 152 ms (ttl=243!) 13 so-7-0-0.gar1.London1.Level3.net (212.113.3.2) [AS9057 - Level 3 RIPE block] 158 ms (ttl=242!) 158 ms (ttl=242!) 158 ms (ttl=242!) 14 pos2-0.metro1-londencyh00.London1.Level3.net (212.113.0.113) [AS9057 - Level 3 RIPE block] 158 ms 158 ms 160 ms 15 195.50.116.30 (195.50.116.30) [AS9057 - Level 3 (ex Businessnet)] 154 ms (ttl=240!) 153 ms (ttl=240!) 153 ms (ttl=240!) 16 fus-rt001-stm1-0-1-0.core.globalconnex.net (80.255.34.17) [AS22351 - Intelsat Specific route within RIPE LIR allocation] 178 ms (ttl=239!) 177 ms (ttl=239!) 176 ms (ttl=239!) 17 fus-rt004-fe-0-0-v2.its-dvb.globalconnex.net (80.255.39.68) [AS22351 - Intelsat Specific route within RIPE LIR allocation] 172 ms 171 ms 171 ms 18 * * * 19 * * * 20 * * * 21 mail2.starcom.co.ug (217.113.72.21) 705.604 ms 699.883 ms 731.260 ms # Loss of 3% for both 100 and 1400 byte packets AMPath
Current State – Aug ‘03 thruput ~ MSS / (RTT * sqrt(loss)) Worksheet: \\zwinsan2\c\cottrell\iepm\table-thru-aug03.xls Within region performance better E.g. Ca|EDU|GOV-NA, Hu-SE Eu, Eu-Eu, Jp-E Asia, Au-Au, Ru-Ru|Baltics Africa, Caucasus, Central & S. Asia all bad Bad < 200kbits/s < DSL Poor > 200, < 500kbits/s Acceptable > 500kbits/s, < 1000kbits/s Good > 1000kbits/s
Examples of Use Need for constant upgrades Upgrades Filtering Pakistan
Usage Examples Identify need to upgrade and effects BW increase by factor 300 Multiple sites track Xmas & summer holiday Selecting ISPs for DSL/Cable services for home users Monitor accessibility of routers etc. from site Long term and changes Trouble shooting Identifying problem reported is probably network related Identify when it started and if still happening or fixed Look for patterns: Step functions Periodic behavior, e.g. due to congestion Multiple sites with simultaneous problems, e.g. common problem link/router … Provide quantitative information to ISPs Increases in bandwidth from 2Mbps to 622Mbps in 6 years Multiple sites track one another, gives rationale for Beacons Improvements around holidays, summer and end of year = students on holidays (most sites Universities) Beacons
Russia Examples Russian losses improved by factor 5 in last 2 years, due to multiple upgrades Upgrade funded by KEK, BINP and US DoE. Little change in RTT, big improvement in loss Spreadsheets: \\Zwinsan2\c\cottrell\iepm\russia-sep03.xls S:\www\grp\scs\net\papers\ictp\binp-may02.xls Shows importance of monitoring E.g. Upgrade to KEK-BINP link from 128kbps to 512kbps, May ’02: improved from few % loss to ~0.1% loss
Usage Examples Peering problems, took long time identify/fix North America Ten-155 became operational on December 11. Smurf Filters installed on NORDUnet’s US connection. Upgrades & ping filtering To Western Europe Identifies time of occurrence so can report to ISP NOCs. Peering problems, took long time identify/fix
Pakistan Example Big performance differences to sites, depend on ISP (at least 3 ISPs seen for Pakistan A&R sites) To NIIT (Rawalpindi): Get about 300Kbps, possibly 380Kbps at best Verified bottleneck appeared to be in Pakistan There is often congestion (packet loss & extended RTTs) during busy periods each weekday Video will probably be sensitive to packet loss, so it may depend on the time of day H.323 (typically needs 384Kbps + 64Kbps), would appear to be marginal at best at any time. Requested upgrade to 1Mbps, and verified got it (Feb ’04) No peering Pakistan between NIIT and NSC
Example S. Asia Factors of six difference, large variations Between countries, even between sites in city Nepal, Sri Lanka & Bangladesh worse off
Challenges 1 of 2 Ping blocking Effort: Complete block easy to ID, then contact site to try and by-pass, can be frustrating for 3rd world Partial blocks trickier, compare with synack Effort: Negligible for remote hosts Monitoring host: < 1 day to install and configure, occasional updates to remote host tables and problem response Archive host: 20% FTE, code stable, could do with upgrade, contact monitoring sites whose data is inaccessible Analysis: your decision, usually for long term details download & use Excel Trouble-shooting: usually re-active, user reports, then look at PingER data Working on automating alerts, data is available for download
Challenges 2 of 2 Funding DoE development/research funding ended 2003 Looking for alternate funding sources Sustain, maintain & extend databases & measurements to more countries Get measurements FROM & within developing regions New analyses, preparing & presenting reports Making contacts, coordinating efforts
Uses Near real time results: Long term trends: Trouble shooting, detect problems see when they occur Long term trends: Set expectations, planning, Give sites/regions better idea of how good/bad things are Input to policy and funding agencies, assist in deciding where help is needed and how to provide Measure before & after upgrades Is it working right, did we get our money’s worth
More Information PingER: MonaLisa GGF/NMWG www-iepm.slac.stanford.edu/pinger/ MonaLisa monalisa.cacr.caltech.edu/ GGF/NMWG www-didc.lbl.gov/NMWG/ ICFA/SCIC Network Monitoring report, Jan03 www.slac.stanford.edu/xorg/icfa/icfa-net-paper-dec02 Monitoring the Digital Divide, CHEP03 paper arxiv.org/ftp/physics/papers/0305/0305016.pdf Human Development Index www.undp.org/hdr2003/pdf/hdr03_backmatter_2.pdf Network Readiness Index www.weforum.org/site/homepublic.nsf/Content/Initiatives+subhome