Download presentation
Presentation is loading. Please wait.
Published byElwin Foster Modified over 9 years ago
1
1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC www.slac.stanford.edu/grp/scs/net/talk/mon-escc-apr00/ Presented at the ESCC meeting Pleasanton April 26, 2000 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP
2
2 Overview Pinger Validations Results Quality of Service Coming soon Summary
3
3 PingER Measurements from –30 monitors in 15 countries –Over 500 remote hosts –Over 70 countries –Over 2100 monitor-remote site pairs Recent monitor additions: ANL, UWisc, NSK, ITEP, RIKEN, KAIST, ILAN, Brazil, Melbourne; working on: Caltech, SDSC Over 50% of HENP collaborator sites are explicitly monitored as remote sites by PingER project –Atlas (37%), BaBar (68%), Belle (23%), CDF (73%), CMS (31%), D0 (60%), LEP (44%), Zeus (35%), PPDG (100%), RHIC(64%) Remainder covered by Beacons –Currently 56, extending to 76
4
4 Beacons & UK seen from ESnet Sites in UK track one another, so can represent with single site 2 Beacons in UK Indicates common source of congestion Increased capacity by 155 times in 5 years Effect of ACLs Direct peering between JANet and ESnet
5
5 PingER Deployment Jan-00
6
6 Validations: Ping vs. Surveyor Scatter plot Ping RTT vs Surveyor RTT gives R 2 ~ 0.92 www.slac.stanford.edu/comp/net/wan-mon/surveyor-vs-pinger.html
7
7 RIPE vs Surveyor 1/2 Little short term correlation even for time differences of < 2 secs Little structure outliers don’t match
8
8 RIPE vs Surveyor 2/2 Optimum agreement if displace RIPE by ~ 0.2 ms (packet size difference)
9
9 PingER vs AMP Little obvious short term agreement (R 2 <0.1) Same if compare ping vs. ping Avg Ping distribution agrees with AMP Both show >=95% of samples are 58-59 msec R 2 > 0.95 for min & avg Time series
10
10 Rate Limiting 1/3 (Mit Shah) “Tail-drop” behavior Rate-limiting kicks in after the first few packets and hence later packets are more likely to be dropped Calculate slope and histogram slope frequency for all nodes, look at outliers (8) Added as PingER metric, Still validating, some sites consistent others vary from month to month
11
11 Rate Limiting 2/3 Hosts mainly in former E. block, S. Asia, Latin America & S. Africa Large asymmetry means ping loss >> sting loss, maybe limiting
12
12 Rate Limiting 3/3 Have identified about 2% of sites possibly limiting Using Sting (Stefan Savage) & SynAck (SLAC) tools to identify loss(sting or synack probes) << loss(ping) www.vincy.bg.ac.yu blocked 884 rounds of 10 ICMP packets each, out of 903 islamabad-server2.comsats.net.pk –blocked 554 out of 903 leonis.nus.edu.sg –blocked all non 56Byte packets All low loss with sting or synack
13
13 Results: How are the U.S. Nets doing? In general performance is good (i.e. <= 1%) ESnet holding steady Edu (vBNS/Abilene) improving, got bad recently XIWT (70%.com) 5-10 times worse than ESnet
14
14 How are DoE funded Edu sites doing V. poor (> 5% & < 12%):PVAMU, VTech vBNS, Acceptable (> 1% & < 2.5%): Brandeis, Rice vBNS, UCR vBNS, UIUC vBNS (2 bad days in March), TAMU I2 Pairs = 137 Fraction NOT good: reduced by 2 in 1.5 yrs
15
15 Europe seen from U.S. 650ms 200 ms 7% loss 10% loss 1% loss Monitor site Beacon site (~10% sites) HENP country Not HENP Not HENP & not monitored
16
16 Asia seen from U.S. 3.6% loss 10% loss 0.1% loss 640 ms 450 ms 250ms
17
17 Latin America, Africa & Australasia 4% Loss 2% Loss 350 ms 700ms 170 ms 220 ms
18
18 Quality of Service: How to improve More bandwidth –Keep network load low (< 30%) –Costs (at least in the W) are coming down dramatically, but non-trivial to keep up Reserved/managed bandwidth generally on ATM via PVCs today Differentiated services
19
19 Effect of more & managed bandwidth German Universities as good as DESY after Oct-99 upgrade DFN closes Perryman POP loses direct ESnet peering Peering re-established via Dante @ 60 Hudson RTT Loss
20
20 RTT from ESnet to Groups of Sites ITU G.114 300 ms RTT limit for voice
21
21 Loss seen from ESnet to groups of Sites ITU limit for loss
22
22 Bulk transfer - Performance Trends Bandwidth TCP < 1460/(RTT * sqrt(loss)) Note: E. Europe not catching up ESnet Flattening out
23
23 Interactive apps - Jitter IPDD(i) = RTT(i) - RTT(i-1)
24
24 SLAC-CERN Jitter ITU/TIPHON delay jitter threshold (75 ms)
25
25 Voice over IP: Reachability Within N. America, & W. Europe loss, RTT and jitter is acceptable for VoIP But what about reachability
26
26 Availability – Outage Probabaility Surveyor probes randomly 2/second Measure time (Outage length) consecutive probes don’t get through http://www-iepm.slac.stanford.edu/monitoring/surveyor/outage.html
27
27 Error free seconds Typical US phone company objectives are 99.999% http://www-iepm.slac.stanford.edu/monitoring/surveyor/err-sec.html What do we see for the Internet using Surveyor measurements
28
28 SLAC & LBNL have a DS testbed with a 3.5Mbps ATM PVC carved out of 43Mbps Made measurements with Becca Nitzan @ ESnet Differentiated services & VoIP PBX VoIP ESnet ATM Bottleneck 3.5Mbps Prod Edge WFQ CAR marking Apply WFQ & policing (via CAR) With WFQ call sounds fine –Next use ping to characterize: Mark ping TOS bits with CAR, & use WFQ in routers and see how it affects loss, RTT, jitter etc. 4Mbps –Inject 4Mbps UDP load No WFQ can’t make call –If make call then terrible quality –Make phone call –< 50% load call OK 24kbps
29
29 Plans 1/2 HEPNRC now rejoined at 50% person Monitoring –next 2 weeks: select packet sizes, number in stream - need for better statistics for high performance links (e.g. PPDG) lower impact on low capacity links –select scheduling, what is logged, mechanism (synack, ping sting) Beacons extend from 50 => 70 (requires new mon)
30
30 Plans 2/2 With XIWT/DARPA –Anomaly detection and alerting –NIMI integration More graphical reports –Maps, Java servlet graphs of more metrics and more selectability –Health watch – upper level displays –Near realtime for SC2000 – possible interest from ESnet NOC Maps with colored links with playback 3D bar charts Extended PPDG support –Higher statistics, better coverage
31
31 Summary Long term agreement between AMP, PingER, Surveyor, & RIPE –need persistent structure (e.g. congestion or route changes) for short term point by point agreement Rate limiting still a minor effect, but could become a problem, trying to get good signature, have alternates International performance from US to sites outside W. Europe, JP, KR, SG, TW is generally poor to bad Managed bandwidth can be big help. ESnet & Internet 2 doing well, even for VoIP, except reachability has a way to go
32
32 More Information This talk: –www.slac.stanford.edu/grp/scs/net/talk/mon-escc-apr00/www.slac.stanford.edu/grp/scs/net/talk/mon-escc-apr00/ IEPM/PingER home site –www-iepm.slac.stanford.edu/www-iepm.slac.stanford.edu/ Comparison of Surveyor & RIPE & PingER –www.slac.stanford.edu/comp/net/wan-mon/surveyor-vs-ripe.htmlwww.slac.stanford.edu/comp/net/wan-mon/surveyor-vs-ripe.html –www.slac.stanford.edu /comp/net/wan-mon/surveyor-vs-pinger.htmlwww.slac.stanford.edu /comp/net/wan-mon/surveyor-vs-pinger.html Detecting ICMP Rate Limiting –www.slac.stanford.edu/grp/scs/net/talk/limiting-feb00/www.slac.stanford.edu/grp/scs/net/talk/limiting-feb00/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.