Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Internet End-to-end Monitoring Project - Overview Les Cottrell – SLAC/Stanford University Partially funded by DOE/MICS Field Work Proposal on Internet.

Similar presentations


Presentation on theme: "1 Internet End-to-end Monitoring Project - Overview Les Cottrell – SLAC/Stanford University Partially funded by DOE/MICS Field Work Proposal on Internet."— Presentation transcript:

1 1 Internet End-to-end Monitoring Project - Overview Les Cottrell – SLAC/Stanford University Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP

2 2 Why Driven by users: HENP physicists with worldwide collaborations with hundreds to thousands of scientists in many tens to hundreds of institutes –Set expectations for planning Where to locate clusters, data, how to replicate data –Trouble-shooting –SLAs First project starting 1995 was PingER

3 3 Measurement Architecture Uses existing ubiquitous Internet ping infrastructure, no tools to install Hierarchical vs. full mesh, each monitoring site chooses remote sites Lightweight – –low network impact (100bits/s/path) –no special machines –trivial to add monitored sites Runs continuously since 1995 WWW Archive Monitoring Remote HEPNRC Reports & Data Cache Monitoring SLAC Ping HTTP Archive 1 monitor host remote host pair

4 4 PingER Measurement Methodology Measurement host admin choose remote hosts of interest –sends 21 pings each 30 mins to each chosen remote host –Records RTT, loss, jitter, unreachable, out of order … –Records data in local cache Archive host gathers data from measurements hosts regularly (at least daily) –Archives, analyzes and generates reports from data –Make reports and data publicly available via the web Requirements: –Remote host: need a host accessible to pings, and a contact in case host does not respond (almost no effort) –Monitoring host: a low end host to make measurements, file space for cache, admin to install toolkit, choose remote hosts, build configuration file, respond to archivers in case unable to get data & keep it running (<<10% FTE) –Archive site: probably about 20% of an FTE

5 5 PingER deployment Measurements from –34 monitors in 14 countries –Over 600 remote hosts –Over 77 countries –Over 3300 monitor-remote site pairs –Measurements go back to Jan-95 –Reports on RTT, loss, reachability, jitter, reorders, duplicates … Countries monitored –Contain 78% of world population –99% of online users of Internet –Mainly A&R sites Monitoring Sites Remote Sites Recently added: –BD, CO, GH, GU, JO, MO, NG, PK Mature, low impact, excellent view of world Internet, e.g. for quantifying Digital Divide

6 6 Results

7 7 History - Round Trip Time (RTT) Improving by 10- 20% year More direct paths Replacing satellites with land lines –Satellite >~550ms Faster lines & network equipment Lower limit speed of light in fiber Typical lower limit today ~ distance/(0.3 * (0.6 * c)) Speed of light in fiber

8 8 History - Loss Loss more critical than RTT Losses cause timeouts of typically seconds 40-50% improve/yr Best networks below 0.1% Russia, SE Europe, China several years behind

9 9 Loss to world from US Using year 2000, fraction of world’s population/country from www.nua.ie/surveys/how_many_online/

10 10 Losses: World by region, Jan ‘02 5%=bad Russia, S America bad Balkans, M East, Africa, S Asia, Caucasus poor

11 11 History - Throughput quality improvements from US TCP BW < MSS/(RTT*sqrt(loss)) (1) (1) Macroscopic Behavior of the TCP Congestion Avoidance Algorithm, Matthis, Semke, Mahdavi, Ott, Computer Communication Review 27(3), July 1997 80% annual improvement ~ factor 10/4yr ~Factor 100 improvement in 8 years

12 12 Summary - results Internet A&R connectivity performance is improving –RTT 10-20%/yr, loss 50%/yr, throughput 80%/yr –Reduced use of satellites, mainly use for new hard to get to areas (e.g. S. Russian Republics) China, S.E. Europe, Russia rate of change keeps up but several years behind India, S. America performance is where N. America & W. Europe were 4 – 5 years ago Improvements need constant investments to understand & improve

13 13 More Information IEPM/PingER home site: –www-iepm.slac.stanford.edu/www-iepm.slac.stanford.edu/ African connectivity –http://www3.sn.apc.org/africa/afrmain.htmhttp://www3.sn.apc.org/africa/afrmain.htm

14 14 IEPM-BW = PingER NG Driven by data replication needs of HENP, PPDG, DataGrid –No longer ship plane/truck loads of data Latency is poor Now ship all data by network (TB/day today, double each year) –Complements PingER, but for high performance nets Build an infrastructure to make E2E network (e.g. iperf, packet pair dispersion) & application (FTP) measurements for high-performance A&R networking Started SC2001

15 15 Tasks Develop/deploy a simple, robust ssh based E2E app & net measurement and management infrastructure for making regular measurements –Major step is setting up collaborations, getting trust, accounts/passwords –Can use dedicated or shared hosts, located at borders or with real applications –COTS hardware & OS (Linux or Solaris) simplifies application integration Integrate base set of measurement tools (ping, iperf, bbcp …), provide simple (cron) scheduling Develop data extraction, reduction, analysis, reporting, simple forecasting & archiving

16 16 Purposes Compare & validate tools –With one another (pipechar vs pathload vs iperf or bbcp vs bbftp vs GridFTP vs Tsunami) –With passive measurements, –With web100 Evaluate TCP stacks (FAST, Sylvain, HS TCP, Frank Kelley …) –Trouble shooting –Set expectations, planning –Understand requirements for high performance performance issues, in network, OS, cpu, disk/file system etc. –Provide public access to results for people & applications

17 17 Deployment SLAC monitoring about 40 remote hosts 10 other monitoring sites running code –APAN, FNAL, NIKHEF, INFN SLAC running production –U Mich, I2, Manchester, UCL, GA Tech evaluating If everything goes right it takes about 30-60 minutes to install a new monitoring site –Usually longer due to need to get web server, ssh keys, ports unblocked, disk space

18 18 Results Time series data, scatter plots, histograms CPU utilization required (MHz/Mbits/s) jumbo and standard, new stacks Forecasting Diurnal behavior characterization Disk throughput as function of OS, file system, caching Correlations with passive, web100

19 19 Next steps Rewrite (again) based on experiences –Improved ability to add new tools to measurement engine and integrate into extraction, analysis GridFTP, tsunami, UDPMon, pathload … –Improved robustness, error diagnosis Need improved scheduling Want to look at other security mechanisms


Download ppt "1 Internet End-to-end Monitoring Project - Overview Les Cottrell – SLAC/Stanford University Partially funded by DOE/MICS Field Work Proposal on Internet."

Similar presentations


Ads by Google