ESnet Abilene 3+3 Measurements Presented at the Joint Techs Meeting in Columbus July 19 th 2004 Joe Metzger ESnet Network Engineer
Collaborators Chin Guok, Bill Johnston & Kevin ESnet Chin Guok, Bill Johnston & Kevin ESnet Chintan Desai & John NCSU Chintan Desai & John NCSU Darryl Wohlt & Phil FERMI Darryl Wohlt & Phil FERMI Jeff Boote, Eric Boyd & Guy Internet2 Jeff Boote, Eric Boyd & Guy Internet2 Jin LBL Jin LBL Kevin SDSC Kevin SDSC Prasad OSU / OARnet Prasad OSU / OARnet
3+3 Measurements 3 ESnet Sites 3 ESnet Sites LBL LBL FERMI FERMI BNL BNL 3 Abilene Participants 3 Abilene Participants SDSC NCSU OSU Abilene and ESnet have systems in place to measure our portions of the net. We were not measuring performance across our interconnections.
Why? We want to ensure that the ESnet/Abilene cross connects are serving the needs of users in the science community who are accessing DOE facilities and resources from universities or accessing university facilities from DOE labs. We want to ensure that the ESnet/Abilene cross connects are serving the needs of users in the science community who are accessing DOE facilities and resources from universities or accessing university facilities from DOE labs.
Existing Measurement Collections were not meeting our needs. ESnet and Abilene monitor traffic, errors and discards on all of our respective links including interconnection points. ESnet and Abilene monitor traffic, errors and discards on all of our respective links including interconnection points. This monitoring shows interconnections are lightly loaded and error free. This monitoring shows interconnections are lightly loaded and error free. Other measurement systems we looked at do not contain the mix of ESnet and Abilene sites we are looking for. Other measurement systems we looked at do not contain the mix of ESnet and Abilene sites we are looking for.
Why Start with Latency Testing? Low Impact Low Impact Sensitive to network events Sensitive to network events
What is OWAMP One Way Measurement Protocol One Way Measurement Protocol A suite of tools A suite of tools
Data Visualization What is the best way to display latency data? What is the best way to display latency data? It is difficult to identify trends in numeric tables. It is difficult to identify trends in numeric tables. What is interesting or meaningful? What is interesting or meaningful? Mean? Mean? No. No. Median and 95 th percentiles? Median and 95 th percentiles? Maybe… Maybe… Distribution? Distribution? Yes! Yes!
SmokePing by Tobias Oetiker Shows distribution of latency measurements. Shows distribution of latency measurements.
Our Visualizations Extended SmokePing graphic design to include multiple data sets on one graph. Extended SmokePing graphic design to include multiple data sets on one graph. SmokePing uses 20 shades of gray and plots one data set on a graph. SmokePing uses 20 shades of gray and plots one data set on a graph. We are using a different color for each data set on a graph. We are using a different color for each data set on a graph. We are using different saturations of the colors to show the distribution of results. We are using different saturations of the colors to show the distribution of results. Show NTP error estimates. Show NTP error estimates. Graphs implemented as RRD templates to leverage existing ESnet statistics tools & infrastructure. Graphs implemented as RRD templates to leverage existing ESnet statistics tools & infrastructure. Do not show loss information at this time. Do not show loss information at this time.
FERMI to SDSC, LBL & NCSU
Red: FERMI to LBL Blue: LBL to FERMI Green NTP Error Estimates A couple packets experienced queuing delays Clock Event
Measurement Servers LBL and NCSU were ready in late April LBL and NCSU were ready in late April FNAL and SDSC in May FNAL and SDSC in May OSU in June OSU in June BNL in July BNL in July
Interesting Observations NTP Error Estimate Quality NTP Error Estimate Quality NCSU Metro DWDM Reroute NCSU Metro DWDM Reroute Queuing caused by bandwidth testing Queuing caused by bandwidth testing Asymetric Routing Asymetric Routing
NTP Error Estimate Quality
NCSU Metro DWDM Reroute Adds about 350 Micro Seconds Fiber Re-Route
Bandwidth Tests Can Cause Queuing on Bottleneck Links Test Traffic Rerouted Tuesday Morning Large Data Transfers
Bottleneck Link Traffic & Discards
Asymetric Routing between FERMI and SDSC (LBL to CENIC link Maintenance)
Future Direction Utilize a generalized, interoperable measurement collection and archiving system instead of current ad-hoc scripts. Utilize a generalized, interoperable measurement collection and archiving system instead of current ad-hoc scripts. Look carefully at implementing bandwidth testing, perhaps using Scavenger QOS. Look carefully at implementing bandwidth testing, perhaps using Scavenger QOS.
Conclusions The ESnet/Abilene interconnections are not the bottlenecks on the tested paths. The ESnet/Abilene interconnections are not the bottlenecks on the tested paths. Latency data can show interesting queuing effects that may not be obvious in other measurements. Latency data can show interesting queuing effects that may not be obvious in other measurements. A single user with a $5K box can congest many current access links. A single user with a $5K box can congest many current access links.
Is 1-2 ms queuing a problem?
The End For more info see For more info see