IEPM-BW Deployment Experiences Connie Logg SLAC Joint Techs Workshop February 4-9, 2006
Background Originally conceived September 13, 2001 and developed as an exhibit for SC2001 in November 2001 on Solaris Looked to be useful so development continued After SC2001, it was installed on a Solaris host shared with other applications. Other applications interfered. Original configuration file was a set of perl commands which defined the nodes, their configuration information, and the probes and parameters for each node. Very hard to understand, maintain, modify, and manage. Quick port to Linux and moved to its own host. Still used perl commands as configuration database.
Background - continued As the development proceeded, it was obvious that the configuration information for nodes and probes was no longer manageable. Enter the MYSQL data base. Whole package was redesigned using MySQL data base Node specifications, monitoring host specifications, probe specifications, plot specifications, path specifications, and data are all maintained in the MySQL data base. Much more readable (web pages to display contents of monitoring configuration information), manageable and adaptable to changing needs and specifications
Conceptual Changes/Challenges The conception of what IEPM-BW should do and which probes it should use has changed over time. Monitor with ping, traceroute, abwe Added iperf Added file transfers to the tests: bbcp, bbftp, gridftp – Discontinued because: Performance tracked iperf Disk speed is the overriding factor in throughput Monitoring and target hosts not likely to be equipped with high speed disks Disk latency studies are important, but should not be part of IEPM-BW Added Pathload for available bandwidth measurements Removed Pathload (suggestion that it was too intense) Added Pathchirp Added Thrulay to compare with Iperf Added Pathload back in per suggestions from collaborators at Ultralight Meeting
Currently Removed abwe as it was too noisy and did not work well on gigabit networks. Evaluating Pathload vs Pathchirp and may remove Pathchirp…more likely will not run it to all nodes, just ones for which it works well. May have to use different types of probes for different types of networks and distances between nodes – ONE SIZE DOES NOT FIT ALL All probes, presentation, and analysis is evolving as we understand more about the networking environments…which are themselves evolving.
Analysis and Presentation Analysis and data presentation ideas change Timeseries plots first plots we had Of interest from the plot below Pathchirp not very good in some cases – reports > 1Gb thruput Pathload more stable and probably accurate RTT change in ping very clear - and seems to have no effect in this case – but does in others – note that it correlated with traceroute change
Analysis and Presentation Added diurnal analysis to look at it and how it might be useful in event detection (bandwidth change) and possibly prediction
Analysis and Presentation Scatterplots – useful for looking at correlations Cross-plots (Y axis: pathchirp & iperf) vs X-axis: Thrulay
Analysis and Presentation Added histograms to provide frequency distribution and CDF Shows possible multimodal distribution of achievable thruput measurements via thrulay But available bandwidth for the same node (by pathchirp) is stable
Analysis and Presentation Packet Loss
Traceroute Visualization One compact page per day One row per host, one column per hour One character per traceroute to indicate pathology or change (usually period(.) = no change) Identify unique routes with a number Be able to inspect the route associated with a route number Provide for analysis of long term route evolutions Route # at start of day, gives idea of route stability Multiple route changes (due to GEANT), later restored to original route Period (.) means no change
Event Detection (throughput drops) Must clearly define what you are looking for How much change and in what time period How to determine if it is time to alert again (don’t want repeated alerts for same drop) Use the above to figure out how often you want to probe. Do not overprobe…try to establish necessary frequency, and if that does the job, that is enough
Implementation Challenges Functions such as ping require different options and parsing on different OSs. When upgrading versions of the probe software, processing code may need to be modified because of output format changes. Not only must upgrade monitoring host probe software, but also target host server versions Being able to track what is working and what is not working and troubleshooting when code performance changes for the worse.
Implementation Challenges Which versions of gnuplot and drivers, MySQL and perl are available? Do they meet our needs? Keeping the servers alive (target kit) Monitoring and target hosts losing disks or having the OSs upgraded. Maintaining proper TCP buffer sizes
Implementation Challenges Many probes have to be done in a synchronous fashion. Do not run iperf, thrulay, and pathload at the same time. Do not want to overload the network with probing activities – this constrains the number and frequency of probes that can be made Currently high impact probes are short (20 seconds or less) and code only allows at most one probe to run within a minute. If a process (probe, script, gnuplot, etc.) cannot hang…it will hang – Time everything out and watch for hangs so they can be automatically cleaned up.
Current Implementation MySQL tables for all configuration information NODES – contains node definitions and path information for that node; all nodes, target and monitoring hosts are defined in this table MONHOST – monitoring host specific information and plotting spec for all the data TOOLSPECS – specification for each probe as well a plotting spec for the data and ‘last run’ field. PLOTSPECS – miscellaneous plotting specifications (scatterplots, timeseries plots, other plot types)
Current Implementation MySQL tables for data storage ABWEDATA – being discontinued (first data table) BWDATA – All bandwidth data is stored here contains fields for: RTT min, max, average, standard deviation Thruput min, max, average, standard deviation, and final throughput Number of streams, windowsize Text results from probe Time of probe Not all fields used for all data types
Current Implementation Tables for Traceroute data ROUTENO – each route seen is given a unique identifier(routeno), and the row contains srcnode, destnode, firstseen, lastseen, ip hop list ROUTEDATA – routeno (from ROUTENO table), text of traceroute output, number of hops, ip hop list, time of probe Historical route data may be interesting to analyze for route changes over time, but no one has had the time or interest to do it. NEW Coming Soon: ASN tables to store ASN info for hops – this is useful as it speeds up interactive drawing and display, and analysis of the traceroutes
Current Implementation SCHEDULE table holds the scheduling information for each probe, and tracks what state it is in. Each and every probe made (including ping) has a unique schedule ID which identifies the probe and all the parameters of the probe Scheduler checks the TOOLSPEC table to ascertain what probes are due to be run and inserts them in the SCHEDULE table Scheduled probes are only run if they are within the “current” time period. This prevents a large number of probes from being stacked up and flooding the network for a long time.
Trouble Shooting Every script has a log file where it records errors and performance information such as how long it took to make a pass. These log files are rotated nightly, and kept for 7 days (easily changed) Hanging probes are a fact of life. Timeout all probes Create a cleanup script that looks for processes which have been active longer than they should be and kills them
Troubleshooting Lingering tasks report – A report showing schedule probes that were not run is generated every day. This is important, as if there are many probes not being run in the nick of time, it may mean that too many are being scheduled to run or that there is a performance problem. Logging Report – A report showing the number of successful probes made, data base write failures, and other failure modes is generated. The info for this report is taken from the data logging log files.
Troubleshooting NETFLOW records are valuable tool Code running fine for years TCP orphan sockets messages crashed machine Netflow records for some 20 second iperf probes were lasting for > 1 minute (some 4 minutes) Change in behavior from the past – were lasting seconds Disabled iperf probes and system stabilized Now need to figure out what goes on with iperf probes…not all troublesome, just a few nodes
Performance Issues When probes show degradation in network performance Is it the network? Is it the monitoring node? – JAVA very bad experience Is it the target node? Recommendation: Have a local target host as a sanity check – also good to use as a target host from other monitoring hosts The monitoring hosts should be dedicated systems Monitor monitoring host load with Lisa, Ganglia, Nagios, APmon to MonALISA, etc.
Performance Issue Example – Bad JAVA Program Caltech monitoring host as seen from iepm- CALTECH target host as seen from iepm- SLAC target host as seen from iepm-
Problems Node name disappears from DNS Ports get suddenly blocked Disks crash (lost the entire CALTECH data base – backup was on same physical disk) – need separate physical disk for local backup Monitoring and target hosts get OS upgrades without warning installed code disappears Data bases get zapped We are now working on backing up data bases and source code configuration information to SLAC once a day. Utility packages (gnuplot, for example) get silently upgraded Discussion about distributing our own
Future Directions Automate installation and configuration process Manage code with CVS and distribute via pacman cache Deploy IEPM-BW for LHC monitoring – see if it is useful and/or relevant – if so, it can be expanded and developed to meet changing needs Upload monitoring data and alerts to MonALISA Implement OWAMP and BWCTL Look at Pathneck Implement min and max (maybe also average) RTT analysis and integrate it with other change analysis
Summary Are you monitoring to determine problems or monitoring for forecasting? They are very different but can both be done with same monitoring With respect to real disk to disk transfers – the disk latency is the overwhelming factor. The monitoring can tell you how the network is performing, but this is not necessarily related to application performance. Bearing this in mind, I do not think we need to perform disk to disk transfers with the monitoring systems or intensive network testing Be prepared to be flexible in your architecture. Networks themselves are constantly evolving and so the probes, analysis, and presentation must also evolve.
Finally What would I have done differently along the way? In hindsight, not a lot. It has been a constant process of learning. The code adapted fairly well to the research we needed to do – Remember it started as an exhibit for SC2001 and has been a research and learning tool since then. More manpower would have been very useful and if it had been available, the code, package structure and the documentation would be more professional, and the change analysis and prediction/forecasting would be more complete.
References: bw.slac.stanford.edu/slac_wan_bw_tests.htmlhttp:// bw.slac.stanford.edu/slac_wan_bw_tests.html Papers/web pages on web100, netflow, and active measurement correlation: Recommended monitoring and target host configurations IEPM-BW Installation and PLM (being updated and reorganized)IEPM-BW Installation and PLM Contributors: Les Cottrell, Jerrod Williams, Mahesh Chhaparia, I-Heng Mei, Manish Bhargava, Jiri Navratil, Yee Ting-Li, all at SLAC now or in the past; Maxim Grigoriev(FNAL), and developers of the probes we use. QUESTIONS?