1 Terapaths: Datagrid WAN Network Monitoring Infrastructure Les Cottrell, Connie Logg, Jerrod Williams SLAC, for the DoE 2004 PI Network Research Meeting, FNAL Sep ‘04 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP
2 Need Data intensive science (e.g. HENP) needs to share data at high speeds Needs high-performance, reliable e2e paths and the ability to use them End users need long and short term estimates of network and application performance for: Planning, setting expectations & trouble shooting You can’t manage what you can’t measure
3 Based on IEPM-BW Toolkit: –Enables regular, E2E measurements with user selectable: Tools: iperf (single & multi-stream), bbftp, bbcp, GridFTP, ping (RTT), traceroute Periods (with randomization) Remote hosts (RH) to monitor from monitoring hosts (MH) –Hierarchical to match the tiered approach of BaBar, D0, CDF & LHC computation / collaboration infrastructures –Includes: Auto-clean up of hung processes at both ends Management tools to look for failures (unreachable hosts, failing tools etc.) Web navigation of results Visualization of data as time-series, histograms, scatter plots, tables Access to data in machine readable form Documentation on host etc. requirements, program logic manuals, methods
4 Requirements –Requires: Monitoring toolkit installed on Linux monitoring host (MH) –Monitoring hosts are independent, no central hierarchy of control –Host provided & administered by monitoring site personnel –No need for root privileges –Appropriate iperf, bbftp etc. ports to be opened –Central IEPM-BW support people (SLAC) can do initial install & configuration for monitoring host »50 line configuration file for each remote host, tells where directories, applications are located, options for various tools etc (mainly defaults) –Monitoring host has Point of Contact (POC), responsible for: »Installation on MH, opening ports, configuring toolkit. Selection of RHs, ssh accounts, installation on RHs … Small toolkit installed at remote (monitored hosts) Ssh access to an account at Remote Host (RH) –This is the biggest problem with deployment
5 IEPM-BW HENP Deployment June 2004 Measurements from SLAC & FNAL –BaBar, CMS, D0, CDF remote hosts in 12 countries Toolkits needed in monitor & remote hosts Range of bandwidths:500Kbps to 1 Gbps
6 Deliverables: Monitoring Hosts deployment Focused on critical target audience: –SLAC (BaBar), FNAL (CDF, D0, CMS) In place, will upgrade to version 3 when ready –BNL (Atlas), CERN (LHC: Atlas/CMS) Following successful deployment of v3 to SLAC & FNAL –ESnet, StarLight (networking sites) –Caltech (CMS: tier 2), UMich (Atlas: tier 2) –Optional European sites: INFN, IN2P3, RAL, GEANT
7 Deliverables: tools More options for security for remote hosts: –Need for some sites such as BNL and NASA Web services API access to data Improved traceroute visualization –Topology/tomography (SciDAC INCITE projecct) –Compressed tables Improved database selection of data Provide & integrate low network utilization tool (SciDAC INCITE project): –~ 25% of Abilene traffic is net measurement Automate detection of anomalous step changes in performance –Including eliminating diurnal effects –Filter alerts –Upon detecting anomaly gather relevant information (network, host etc.) including on-demand measurements (e.g. NDT) and prepare web page & –Improved web services access
8 Deliverables: QOS Evaluate using QOS or HSTCP-LP – to reduce impact of iperf traffic Evidence that causes packet loss (ESnet/FNAL/SLAC) – where is it useful for HENP Which paths – input to MPLS
9 Miscellaneous MOU with Pakistan NIIT to collaborate on development for PingER/MAGGIE for 1 year –Travel but no FTE funding Working with Internet 2 E2E PiPES –PI is part of E2E Pi TAG Working with NLANR AMP project Close coordination with HENP community –PI is a PPDG member, –SLAC is an accelerator center and home of BaBar experiment
10 Thanks: on-going Foreign: –Andrew Daviel (TRIUMF), Simon Leinen (SWITCH), Olivier Martin (CERN), Sven Ubik (CESnet), Kars Ohrenberg (DESY), Bruno Hoeft (FZK), Dominique (IN2P3), Fabrizio Coccetti (INFN), Cristina Bulfon (INFN), Yukio Karita (KEK), Takashi Ichihara (RIKEN), Yoshinori Kitasuji (APAN), Antony Antony (NIKHEF), Arshad Ali (NIIT), Serge Belov (BINP), Robin Tasker (DL & RAL), Yee Ting Lee (UCL), Richard Hughes-Jones (Manchester) US –Shawn McKee (Michigan), Tom Hacker (Michigan), Eric Boyd (I2), Stanislav Shalunov (SOX), George Uhl (GSFC), Brian Tierney (LBNL), John Hicks (Indiana), John Estabrook (UIUC), Maxim Grigoriev (FNAL), Joe Izen (UT Dallas), Chris Griffin (U Florida), Tom Dunigan (ORNL), Dantong Yu (BNL), Suresh Singh (Caltech), Chip Watsom (JLab), Robert Lukens (JLab), Shane Canon (NERSC), Kevin Walsh (SDSC), David Lapsley (MIT/Haystack/ISI-E)
11 More information IEPM-BW home page – Comparison of Internet E2E Measurement infrastructures; – iepm.slac.stanford.edu/grp/scs/net/proposals/infra-mon.htmlhttp://www- iepm.slac.stanford.edu/grp/scs/net/proposals/infra-mon.html ABwE lightweight bandwidth estimation – Anomalous Event Detection – IEPM Web Services – / /