Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 8 th April 2015.

Slides:



Advertisements
Similar presentations
WLCG Operations and Tools TEG Monitoring – Experiment Perspective Simone Campana and Pepe Flix Operations TEG Workshop, 23 January 2012.
Advertisements

WLCG Interaction Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.
Update on OSG/WLCG perfSONAR infrastructure Shawn McKee, Marian Babik HEPIX Spring Workshop, Oxford 23 rd - 27 th March 2015.
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014.
Proximity service Main idea – provide “glue” between experiments and sonar topology – mainly map sonars to storages and vice versa – determine existing.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Use Cases. Summary Define and understand slow transfers – Identify weak links, narrow down the source – Understand what perfSONAR measurements mean wrt.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) WLCG GDB, CERN 8 July 2015.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP35, Liverpool 11 Sep 2015.
The production deployment of IPv6 on WLCG David Kelsey (STFC-RAL) CHEP2015, OIST, Okinawa 16 Apr 2015.
Connect communicate collaborate perfSONAR MDM updates: New interface, new weathermap, towards a complete interoperability Domenico Vicinanza perfSONAR.
The HEPiX IPv6 Working Group David Kelsey EGI TF, Prague 18 Sep 2012.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik perfSONAR Operations Sub-group 22 nd October 2014.
CERN11 th February WLCG Ops Coordination [GDB Report] Josep Flix (PIC/CIEMAT) On behalf of the WLCG Operations Coordination Team GDB – CERN.
Update on OSG/WLCG Network Services Shawn McKee, Marian Babik 2015 WLCG Collaboration Workshop 12 th April 2015.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
Update on WLCG/OSG perfSONAR Infrastructure Shawn McKee, Marian Babik HEPiX Fall 2015 Meeting at BNL 13 October 2015.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 18 h March 2015.
WLCG perfSONAR-PS Update Shawn McKee/University of Michigan WLCG Network and Transfers Metrics Co-Chair Spring 2014 HEPiX LAPP, Annecy, France May 21 st,
WLCG Network and Transfer Metrics WG After One Year Shawn McKee, Marian Babik GDB 4 th November
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Network and Transfer WG perfSONAR operations Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 28 h January 2015.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
Update on Network and Transfer Metrics WG Shawn McKee, Marian Babik GDB 8 th October 2014.
PerfSONAR Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Cambridge, UK February 9 th, 2015.
PerfSONAR for LHCOPN/LHCONE Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Amsterdam, NL October 28 th, 2015.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
DDM FAX Dashboard status and future Luca Magnoni IT/SDC 2 nd June 2014.
Network Awareness and perfSONAR Why we want it. What are the challenges? Where are we going? Shawn McKee / University of Michigan OSG AHG - US CMS Tier-2.
WLCG Latency Mesh Comments + – It can be done, works consistently and already provides useful data – Latency mesh stable, once configured sonars are stable.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
LHCONE Monitoring Thoughts June 14 th, LHCOPN/LHCONE Meeting Jason Zurawski – Research Liaison.
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
WLCG Service Report ~~~ WLCG Management Board, 17 th February 2009.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016.
HEPiX IPv6 Working Group David Kelsey david DOT kelsey AT stfc DOT ac DOT uk (STFC-RAL) HEPiX, Vancouver 26 Oct 2011.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) HEPIX, BNL 13 Oct 2015.
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
PerfSONAR operations meeting 3 rd October Agenda Propose changes to the current operations of perfSONAR Discuss current and future deployment model.
Shawn McKee, Marian Babik for the
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
perfSONAR-PS Deployment: Status/Plans
POW MND section.
LHCOPN/LHCONE perfSONAR Update
Update from the HEPiX IPv6 WG
Alerting/Notifications (MadAlert)
LHCONE perfSONAR: Status and Plans
Network Monitoring Update: June 14, 2017 Shawn McKee
WLCG and support for IPv6-only CPU
IPv6 update Duncan Rand Imperial College London
Presentation transcript:

Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 8 th April 2015

Next Meetings/Events – 6 May, 3 June, 8 July, 2 Sept - all at 4pm CEST – CHEP – talk attached in indico, please send your comments – CHEP paper – deadline 17 th May – input from experiments needed Input to use cases document still missing ! Today ● perfSONAR status ● Mesh configuration changes ● Operations status ● Network performance incidents ● esmond status/plans ● Integration projects Outline Network Monitoring and Metrics WG Meeting

3 perfSONAR Status Network Monitoring and Metrics WG Meeting

Added DebugMesh – AGLT2 – SARA Added Belle II mesh – lead by Malachi Schram Added Global mesh – aimed at integrating Tier3s Removed BNL from certain meshes to decrease the load – now only in US ATLAS and WLCG Changed parameters for the default latency tests – increased sample count from 300 to 600 Dual-stack mesh – Currently using same parameters for IPv4 as default – shall we merge ? – Test frequency can be lowered if necessary – any update ? UK mesh contains two nodes for Manchester DE mesh contains UKI-NorthGrid-Shef-HEP Configuration interface at – You need to be registered in OIM and authorized to access – Please contact me or Shawn if you have issues Network Monitoring and Metrics WG Meeting 4 Mesh Configuration changes

3.4.2 release status – WLCG perfSONAR service status report on :02: ======= – Active perfSONAR instances: 233 – Registered/monitored perfSONAR instances: 259 – perfSONAR-PS versions deployed: – : 33 – : 172 – Unknown: 26 – Incorrectly configured (failing >4 metrics): 26 Please check status of sonars in your region Network Monitoring and Metrics WG Meeting 5 perfSONAR operations

For LHCOPN/LHCONE on-going investigation if sonars are consistently delivering metrics – Significant improvement observed after Wrote data completeness check – Active sonars (participating in a mesh): 233 – Latency sonars (OPN mesh): 13 – Theoretical size of full mesh: 156 – Total number of working links (both directions): 156 – Ratio: – Bandwidth sonars (WLCG mesh): 110 – Theoretical size of full mesh: – Total number of working traces (both directions): 9690 – Ratio: Plan – Start with a planned top-k ramp up in the WLCG Latency Network Monitoring and Metrics WG Meeting 6 perfSONAR operations

Security – CVE – CVE released 2nd of April 2015 for cassandra, which is used by the perfSONAR measurement archive software, esmond. NO action required to protect perfSONAR Toolkit since vulnerable ports are both disabled and firewalled. Infrastructure monitoring – Added check_memory (checks 4GB minimum requirement) – Added as minimum version – Increased time-range for bw tests to 7200 seconds – Already in test, should move to production tomorrow Network Monitoring and Metrics WG Meeting 7 perfSONAR operations

8 Network Incidents Follow up Network Monitoring and Metrics WG Meeting

Discussed at the WLCG ops coordination, agreed to start with it as is with possible modifications once we gain more experience Experiments – Report to mailing list (wlcg-network-throughput) or GGUS SU (TBD) – GGUS SU will be backed by the mailing list (initial members are experiments and transfer contacts and WLCG perfsonar support unit) – WLCG perfSONAR support unit to confirm if this is network issue - contacts sites – Concerned sites report to their R&E informing mailing list – List of on going incidents on WG page ( work_Performance_Incidents) Sites – Report to their NREN or provider, inform mailing list – Escalate to WLCG operations coordination to resolve inter-site issues Network Monitoring and Metrics WG Meeting 9 Network Incidents Follow up

10 Datastore/esmond and Pilot Projects Network Monitoring and Metrics WG Meeting

Validation work on-going - getting metrics to check accuracy and coverage/completeness of the data collection – Check if content in esmond matches local MAs (via sampling) – Check if esmond content is accurate Network Monitoring and Metrics WG Meeting 11 esmond Status/Plans

RSV probes (OSG) – collecting metrics esmond (OSG) – datastore esmond2mq – Henryk developed a prototype in python – Update from Henryk TBD – Retrieves all data (meta+raw) from esmond depending on existing mesh configs – Publishes to a topic Proximity/topology service – Work on-going to get initial version (proximity.cern.ch) – started with site-based mappings – Fetch active SEs from FTS and map them via SITEs to perfSONARs – Fetch SEs from site topology and map them to perfSONARs – Mapping/translation of services (service to service; storage to sonar, sonar to storage), service to site (sonar to site) to be accessible via API (JSON) – Plan: Test different algorithms (site mapping, traceroutes, geoip) – Evaluate if existing tools can be reused for this purpose Network Monitoring and Metrics WG Meeting 12 Experiments Interface to Esmond

FTS - low level data movement service – used by majority of WLCG transfers Current granularity and coverage is a good match to perfSONAR network Integration use cases available from FTS, but also from experiments ATLAS is doing a study (FTS performance study) that already integrates FTS and network metrics (traceroutes) – provides excellent starting point and can be extended to CMS and LHCb – Contacted CMS and LHCb, CMS interested, still waiting for LHCb Contacted Saul, Tomas and Hassen – Work is on-going to get common source of data for both FTS dashboard and FTS performance study – Outcome of the study can be very useful as an input to the FTS dashboard roadmap To be discussed in detail at the next meeting (May 6 th 4PM CEST; Network Monitoring and Metrics WG Meeting 13 FTS performance

Network Monitoring and Metrics WG Meeting 14 AOB

----SUMMARY---- Fetch metadata: 1177 sec, avg sec/entry Fetch raw data: sec, avg sec/event, 30 threads Metadata processed: (avg entries/sec) Events processed: (avg events/sec) Types: {'packet-trace': 12686, 'packet-count-sent': 3545, 'packet-count-lost': 3518, 'histogram-owdelay': 3394, 'throughput': 452, 'packet-loss-rate': 3515} Queues: Preprocessing: 316 Postprocessing: 30 Messages sent: 77 Types: {'packet-trace': 64, 'packet-count-sent': 4, 'packet-count-lost': 3, 'histogram-owdelay': 2, 'throughput': 1, 'packet-loss-rate': 3} Run time: 3661 sec Network Monitoring and Metrics WG Meeting 15 esmond2mq