Monitoring Appliance Status Otto Kreiter, DANTE LHCOPN, Geneva, 17.06.2008
Agenda Monitoring parameters & monitoring scheduling (proposal) Deployment process for the LHCOPN perfSONAR MDM v3.0 & E2EMON features
Monitoring parameters & Monitoring scheduling (proposal)
Typical T0-T1 measurement appliance installation reminder HADES Agent (delay) Pinger/Tracert-MP Box-1 Box-2 Box-3 LHCOPN Cacti, RRD-MA (count) SQL MA(L2 stat) E2ESync Lookup service BWCTL tool – MP(TCP) perfSONARBuoy Auth. Service
Metrics collected & scheduling Delay - scheduled measurements Owe-Way Delay (OWD), IP Packet Delay Variation (IPDV), One Way Packet Loss (OWPL) Every 60s Achievable Bandwidth - scheduled measurements TCP throughput transfer (max. 950Mbit/s) Every 6h Achievable Bandwidth - on-demand measurements TCP and UDP throughput transfer (max. 950MBit/s) Traceroute - scheduled measurements Hop list Every 5 min
Metrics collected & scheduling cont. Router Interface statistics Link capacity Link utilisation Interface input errors Interface output drops Every 5 min L2 circuit status domain and/or inter-domain circuit status (UP/DOWN)
Delay - scheduled measurements Enable to identify routing issues (OWD) to identify congestion (w/ or w/o packet loss) (OWD, IPDV, OWPL) to identify high packet loss rate (OWPL) to identify path instability (IPDV) to keep a historical trace of the changes at a finer grained granularity (60s) to validate the recovery of the service after failure to verify consistency before and after maintenance
Achievable Bandwidth -scheduled measurements Enable to identify performance degradation to compare data transfer rate against historical baseline from a well tuned host to validate the recovery of the service after failure
Achievable Bandwidth on-demand measurements Enable to troubleshoot TCP throughput performances by running tests from a well tuned host and compare them against a historical information to validate the recovery of the service after failure to verify consistency of the performances before and after maintenance
Traceroute - scheduled measurements Enable to identify IP path stability over time to identify routing issues to validate the path recovery of the service after failure to verify the path consistency before and after maintenance
Router Interface statistics Enable to identify traffic load for troubleshooting (congestion, heavy utilisation), long term trend and planning to estimate the available bandwidth to identify short term and long term congestion with losses (output drops) to identify faulty links (input errors) to verify traffic recovery after failure to verify consistency before and after maintenance
L2 circuit status Enable to identify the status of a circuit segment in a given domain.
Conclusion Parameters and scheduling wel estalished but not nailed. Scheduling can be customized as per customer needs. Next step to demonstrate UI – next LHCOPN
Deployment process for the LHCOPN
perfSONAR MDM LHCOPN Site Deployment Site Deployment Steps: Site survey Hardware purchase Shipment of the hardware to site Hardware installation Software installation Service configuration
perfSONAR MDM LHCOPN deployment steps Deployment planned in phases Each phase involves 4 sites Each site will: Be addressed individually Have a dedicated Service Desk person Successful deployment depends upon site collaboration
perfSONAR MDM v3.0 & E2EMON features
perfSONAR MDM 3.0 Significant improvements to software installation rpm and Debian packages New 'Web Admin' interface slick web based configuration interface for ease of configuration, administration and support of software. Authentication and Directory services (Lookup) validation of identities specified by eduGAIN identity from various Identity Providers The bundle contains 10 different web service software. http://wiki.perfsonar.net/jra1-wiki/index.php/PerfSONAR_v3.0
E2EMON enhancement SNMP trap for Nagios sensors / per project and per link to selected recipients (requested by PIC and IN2P3) Feature available mid July.