Download presentation
Presentation is loading. Please wait.
Published byKerrie Scott Modified over 9 years ago
1
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 1 ESnet End-to-end Internet Monitoring Les Cottrell and Warren Matthews, SLAC andSLAC David Martin, HEPNRC Presented at the ESSC Review Meeting, Berkeley, May 1998 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM)
2
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 2 Outline of Talk Why are we (ESnet/HENP community) measuring? What are we measuring & how? What do we see? What does it mean? Summary –Deployment/development, Internet Performance, Next Steps –Collaborations
3
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 3 Why go to the effort? Internet woefully under-measured & under- instrumented Internet very diverse - no single path typical Users need end-to end measurements for: –realistic expectations, planning information –guidelines for setting and validating SLAs –information to help in identifying problems –help to decide where to apply resources Complements ESnet utilization measurements Provides information for reporting problems to NOC
4
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 4 Our Main Tool (PingER) is Ping Based “Universally available”, easy to understand –no software for clients to install Low network impact Provides useful real world measures of response time, loss, reachability, unpredictability Now monitoring from 14 sites in 8 countries monitoring > 500 links in 22 countries (> 300 sites) Resources: 6bps/link, ~600kBytes/month/link
5
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 5 Measurement Architecture WWW Archive Monitoring Remote HEPNRC Archive Reports & Data Cache Monitoring SLAC Ping HTTP Analysis
6
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 6 Ping Loss Quality Want quick to grasp indicator of link quality Loss is the most sensitive indicator –Studies on economic value of response time by IBM showed there is a threshold around 4-5secs where complaints increase. –loss of packet requires ~ 4 sec TCP retry timeout –For packet loss we use following thresholds: 0-1% = Good1-2.5% = Acceptable 2.5%-5% = Poor 5%-12% = Very Poor > 12% = Bad (unusable for interactive work)
7
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 7 Quality Distributions from SLAC ESnet median good quality Other groups poor or very poor
8
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 8 Aggregation/Grouping Critical for 14 monitoring sites & > 500 links Group measurements by: –area (e.g. N. America W, N. America E, W. Europe, Japan, Asia, others, or by country, or TLD) –trans-oceanic links, intercontinental links, crossing IXP –ISP (ESnet, vBNS/I2, TEN-34...) –by monitoring site –one site seen from multiple sites –common interest/affiliation (XIWT, HENP, Expmt …) Beware: reduces statistics, choice of sites critical
9
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 9 Tabular Navigation Tool Select grouping, e.g. Intercontinental, TLDs, Site to site... Select metric Response, Loss, Quiescence, Reachability... Select month Goes back to Jul-97 Colored by quality < 62.5ms excellent (white) <125ms good (green) < 250ms poor (yellow) <500ms very poor (pink) >500ms bad (red) Drill down Site to show all sites monitoring it Value to see all links contributing MouseOver To see number of links To see country To see monitoring site Remote site Monitoring site,
10
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 10 Drill down (all sites monitoring CERN) CMU CNAF RL FNAL SLAC DESY Carelton RMKI CERN KEK Select one of these groups Also provides Excel for DIY Sort
11
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 11 For about 80 remote sites seen from SLAC Response time improved between 1 and 2.5% / month Loss - similar (closer to 2.5%/month) Overall Improvements Jan-95 Nov-97
12
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 12 How does it look for ESnet Researchers getting to US sites (280 links, 28 States)? Within ESnet excellent (median loss 0.1%) To vBNS sites very good (~ 2 * loss for ESnet) DOE funded Universities not on vBNS/ESnet –acceptable to poor, getting better (factor 2 in 6 months) –lot of variability (e.g.) Brown T, UMass T = unacceptable(>= 12%) TPitt*, SC*. ColoState*, UNM T, UOregon T, Rochester*, UC*, OleMiss*, Harvard 1q98, UWashington T, UNM T = v. poor(> 5%) Syracuse T, Purdue T, Hawaii* = poor (>= 2.5%) –*=no vBNS plans, T = vBNS date TBD, V =on vBNS
13
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 13 University access changes in last year A year ago we looked at Universities with large DOE programs Identified ones with poor (>2.5%) or worse (>5%) performance T –UOregon T, Harvard 1q98, UWashington T = very poor (>= 5%) –JHU V, Duke V, UCSD V, UMD V, UMich T, UColo V, UPenn T, UMN V, UCI T, UWisc V = acceptable (>1%)/good –*=no vBNS plans, T = vBNS date TBD, V =on vBNS
14
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 14 Canada 20 links, 9 remote sites, 7 monitoring sites Seems to depend most on the remote site –UToronto bad to everyone –Carleton, Laurentian, McGill poor –Montreal, UVic acceptable/good –TRIUMF good with ESnet, poor to CERN
15
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 15 Europe Divides up into 2 –TEN-34 backbone sites (de, uk, nl, ch, fr, it, at) within Europe good performance from ESnet good to acceptable, except nl, fr (Renater) &.uk are bad –Others within Europe performance poor from ESnet bad to es, il, hu, pl acceptable for cz
16
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 16 Asia Israel bad KEK & Osaka good from US, very poor from Canada Tokyo poor from US Japan-CERN/Italy acceptable, Japan-DESY bad FSU bad to Moscow, acceptable to Novosibirsk China is bad everywhere
17
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 17 Looks pretty bad for intercontinental use Improving (about factor of 2 in last 6 months) Intercontinental Grouping (Loss)
18
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 18 Deployment Development –ESnet/HENP/ICFA has 14 Collection sites in 8 countries collecting data on > 500 links involving 22 countries –HEPNRC archiving/analyzing, SLAC analyzing –600KB/month/link, 6 bps/link,.25 FTE @ archive site, 1.5-2.5 FTE on analysis –reports available worldwide to end-users to access, navigate, review & customize (via Excel) & see quality –4GBytes of data available to experts for analysis –tools available for others to monitor, archive, analyze XIWT/IPWT chose & deployed PingER ~ 10 collection sites are now monitoring 41 beacon sites Summary 1/5
19
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 19 Summary 2/5 Deployment Development Next Steps –Improve tools: Improve statistical robustness - Poisson sampling, medians More groupings, beacon sites, matched pairs, for comparison More navigation features to drill down Better/easier identification of common bottlenecks Prediction (extrapolations, develop models, configure and validate with data) –Pursuing deployment of dedicated PC based monitor platforms: IETF Surveyor & NIMI/LBNL IETF Surveyor NIMIs up & running at PSC, LBNL, FNAL, SLAC, CERN (CH), working with RAL (UK), KEK (JP), DESY (DE) Will provide throughput, traceroute & one way ping measurements
20
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 20 Summary 3/5 Deployment Development Next Steps Internet Performance (summary for our 500 links) –Performance within ESnet is good –Performance to vBNS good (median loss ~ 2* ESnet) –Performance to non ESnet/vBNS sites is acceptable to poor –Intercontinental performance is very poor to bad –Response time improving by 1-2% / month –Packet loss improving between SLAC & other sites by 3% / month since Jan-95, –Very dynamic
21
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 21 Summary 4/5 Deployment Development Next Steps Internet Performance (continued): –Links to sites outside N. America vary from good (KEK) to bad –Canada a mixed bag, depending on remote site it is acceptable to bad –TEN-34 backbone countries (exc UK) good to acceptable –Otherwise Europe poor to bad –Asia (apart from some Japanese sites) is bad –Rest of world generally poor to bad.
22
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 22 Summary 5/5 Deployment Development Next Steps Internet Performance Lots of collaboration & sharing: –SLAC & HEPNRC leading effort on PingER –14 monitoring sites, ~ 400 remote sites –Monitoring site tools CERN & CNAF/INFN, Oxford/TracePing –MapPing/MAPNet working with NLANR –TRIUMF Traceroute topology Map –NIMI/LBNL & Surveyor/IETF/IPPM –Industry: XIWT/IPWT, also SBIR from NetPredict on prediction –Talks at IETF, XIWT, ICFA, ESSC, ESCC, Interface’98, CHEP… –Lots of support: DOE/MICS/ESSC/ESnet, ICFA, XIWT
23
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 23 More Information & extra info follows ICFA Monitoring WG home page (links to status report, meeting notes, how to access data, and code) –http://www.slac.stanford.edu /xorg/icfa/ntf/home.htmlhttp://www.slac.stanford.edu /xorg/icfa/ntf/home.html WAN Monitoring at SLAC has lots of links –http://www.slac.stanford.edu /comp/net/wan-mon.htmlhttp://www.slac.stanford.edu /comp/net/wan-mon.html Tutorial on WAN Monitoring –http://www.slac.stanford.edu /comp/net/wan-mon/tutorial.htmlhttp://www.slac.stanford.edu /comp/net/wan-mon/tutorial.html PingER History tables –http://www.slac.stanford.edu/ /xorg/iepm/pinger/table.htmlhttp://www.slac.stanford.edu/ /xorg/iepm/pinger/table.html NIMI http://www.psc.edu/~mahdavi/nimi_paper/NIMI.html http://www.psc.edu/~mahdavi/nimi_paper/NIMI.html
24
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 24 Perception of Packet Loss Above 4-6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate. The occurrence of long delays of 4 seconds or more at a frequency of 4-5% or more is also irritating for interactive activities such as telnet and X windows. Above 10-12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable.
25
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 25 180 Day Ping Performance SLAC- CERN
26
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 26 Running 10 week averages Sorted on biggest change Standard deviation gives idea of loading
27
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 27 Quiescence Frequency of zero packet loss (for all time - not cut on prime time)
28
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 28 Response & Loss Improvements Improved between 1 and 2.5% / month Response & Loss similar improvements
29
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 29 Diagonals are within TLD US good/accept for it, de, ch & cz Hungary is poor China unusable Canada poor to bad UK - US bad Top Level Domain Grouping (Loss)
30
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 30 US ESnet & vBNS ESnet Median 0.1% Links 36 Unique remote sites 17 Monitoring sites 6 vBNS Median 0.3% Links 30 Unique remote sites 18 Monitoring sites 4.EDU, non ESnet/vBNS Median 1.5% (avg 3.2%) Links 54 Unique remote sites 36 Monitoring sites 3
31
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 31
32
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 32
33
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 33 Loss Delay Advanced to U Chicago U Chicago to Advanced
34
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 34 MapPing Java Applet, based on MapNet from NLANR –Colors links by performance –Selection: collection site performance metric month zoom level –Mouse over gives coords
35
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 35 Traceroute Topology Tool Reverse traceroute servers Traceping TopologyMap –Ellipses show node on route –Open ellipse is measurement node –Blue ellipse not reachable –Keeps history From TRIUMF KEK FNAL DESYCERN
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.