Network Performance Measurement Atlas Tier 2 Meeting at BNL December 14 2005 Joe Metzger metzger@es.net
Network Performance Measurement Motivation Requirements Measurements Useful tools Suggestions
Motivation LHC science is going to rely on a very complex web of many different networks. Networks are not static. They are in a constant state of change. Components fail. Wide area links are affected by plane crashes, train derailments, fires, floods construction, etc.
10 Gigabit networking will present new challenges. The high costs of 10GE routers is pushing people to deploy different architectures using lower cost switches. Many of these lower cost devices provide very limited diagnostic and debugging support. Over subscription of circuits and bottlenecks within equipment may cause un-expected packet loss. The differences between OC192, 10GE LANPHY, 10GE WANPHY and almost 10GE line-rate equipment may cause buffering problems. These issues will lead to problems that will be challenging to identify and resolve.
Requirements You must have the ability to easily determine the status of the networks you rely on for your missions. Up and working correctly? Can you prove it? Down Is there a known problem that is being worked on? Are you seeing a symptom of the problem or something else? Is the network down or the applications down? Can you prove the problem is not on your campus? Who do you call? What info can you provide to the NOC who will fix it? Up but not performing as expected. Is there a known problem? What info can you provide to help identify the problem?
Measurement Paths Capacity Bandwidth What set of links do your packets traverse? Capacity What is the capacity of those links? What is the utilization of those links? Bandwidth What is the achievable bandwidth across the path? Is it stable or does it change? Latency between points of interest What is the round trip time across the path?
Tools Paths Latency Utilization Bandwidth Traceroute Ping and OWAMP Collect with MRTG/Cricket/SNAPP etc Publish with PerfSONAR. Bandwidth Iperf, netperf, nuttcp and BWCTL
Traceroute
OWAMP Measurements
Traceroute ESnet PerfSONAR Traceroute Visualizer Trace Submitted Tracing route to cache3.bnl.gov 130.199.3.21 over a maximum of 30 hops 1 1 ms 1 ms 1 ms joem-fe-stub.es.net 198.124.224.5 2 21 ms 21 ms 21 ms chi-ameslab.es.net 134.55.208.38 3 21 ms 21 ms 21 ms chicr1-ge0-chirt1.es.net 134.55.209.189 4 41 ms 41 ms 41 ms aoacr1-oc192-chicr1.es.net 134.55.209.58 5 43 ms 43 ms 43 ms bnl-oc48-aoacr1.es.net 134.55.209.130 6 43 ms 43 ms 43 ms bnl-esbnl.es.net 198.124.216.114 Analyzing Trace
Recommendations Deploy test servers at the edge of your network. OWAMP for measuring latency. IPERF/BWCTL for measuring bandwidth. Monitor interface utilization Capture with MRTG, Cricket, SNAPP or other tools. Export results to community using a PerfSONAR Measurement Archive.
Summary Deploy tools that will allow you to: Use these tools to: Determine if your applications are working correctly. Determine if your network is working correctly. Generate useful information for diagnosing problems. Use these tools to: Continuously document your performance so you know when it changes. Share your network measurements results.
PerfSONAR Plug PerfSONAR is a Network Measurement Architecture that is being jointly developed with ESnet, GEANT, Internet2, and a half dozen European NRENS to collect, store and exchange network measurements. ESnet has developed a proof of concept tool that analyzes the output from traceroute and displays link capacity & utilization graphs for all the links crossed which are available from PerfSonar Measurement Archive servers.
More Info PerfSONAR OWAMP IEPM BWCTL http://monstera.man.poznan.pl/jra1-wiki/index.php/PerfSONAR_About PerfSONAR Traceroute Visualization https://performance.es.net/cgi-bin/level0/perfsonar-trace.cgi OWAMP http://e2epi.internet2.edu/owamp IEPM http://iepmbw.bnl.gov/iepm-bw.bnl.gov/slac_wan_bw_tests.html http://www-iepm.slac.stanford.edu/bw/ BWCTL http://e2epi.internet2.edu/bwctl/