Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alerting/Notifications (MadAlert)

Similar presentations


Presentation on theme: "Alerting/Notifications (MadAlert)"— Presentation transcript:

1 Alerting/Notifications (MadAlert)
WLCG-wide meshes Latency mesh: 94 sonars (84% efficiency) Traceroute mesh: 115 sonars (94% efficiency) Bandwidth mesh: 102 sonars (76% efficiency) Stream Publishing from ITB available (issues reported with VM) Publishing from production planned 13th of October Datastore (OSG) In production since 14th of Sept Proximity Mapping btw sonars and storages for all experiments available (updated, mainly bugfixes) Dashboard psmad connected to the datastore already showing good content – shows more recent results than maddash.aglt2 Alerting/Notifications (MadAlert) Initial version of madalert available shows network/infrastr. problems

2 perfSONAR 3.5 Released on Monday (28th Sept)
Support for centOS, Debian and VMs (packaged bundles) perfSONAR Tools (just tools) perfSONAR TestPoint (passive, no MA) perfSONAR Core (+MA) perfSONAR Complete (+Web and Toolkit Configuration) perfSONAR Central Management (Maddash, Auto-config, Centralized config service) Introduces new web frontend Support for low-cost nodes WLCG Deployment status (very good progress): 3.4.1 : 7 3.4.2 : 67 3.5 : 154 Unknown: 20

3 WLCG wide meshes Summary of changes: Re-enabled project meshes
Full latency (one-direction only, 10Hz, OWAMP, IPv4) Full traceroute (bi-directional, hourly, BWCTL/OWAMP, IPv4, IPv6) Full bandwidth (one-direction only, fortnightly, BWCTL-only !, IPv4, IPv6) Re-enabled project meshes Belle II – both latency and bandwidth Dual-stack – just bandwidth (both IPv4 and IPv6) Regional meshes still disabled, need to discuss how to evolve We can create any sub-mesh of the full latency mesh (for free, but only IPv4 and using same params) We could move from regional to bigger meshes (European, Asia/Pacific, US) We can create new bandwidth meshes as bwclt needs fewer resources (but only for BWCTL-only nodes, not on dual-nodes)

4

5 Latency full mesh issues
Taiwan – performance, high load (long thread) DESY-HH – unstable (GGUS) Manchester – works fine, but unstable (gateway timeouts, MA unreachable) Durham – only works for UK (firewall ?) MWT2, Oklahoma, SLAC - only after update to 3.5 Florida, UCSD – performance, high load Wisconsin – offline (temporary) INFN-Roma – offline (temporary)

6 Remaining sonars Those that were not added to the full mesh and are still in global mesh (latency or bandwidth) Detailed summary sent to wlcg-perfsonar-support, three categories: Offline or misconfigured sonars Performance issues (Aachen, Sonars with <4 GB RAM How can we integrate sonars with constrained resources ?

7 Integration FTS performance study meeting held 15th of Sept.
TCP buffer size limit - new algorithm proposed and discussed – to be followed up by FTS team reported details on SRM overhead From the use case document: Integration of perfSONAR in the ATLAS data analytics, Panda, SSB Integration with DIRAC (LHCb) CERN IT Data analytics WG interested in perfSONAR For CMS we’re currently missing contact person (to be followed up) Also interest from network community (Asia Tier Centre Forum)

8 HEPiX Abstract submitted, plan is to focus on sites:
show importance of measuring network performance and impact of latency and packet loss in throughput discuss existing network, it’s coverage and capabilities show how one can discover nearest sonars in our network (proximity) describe existing tools and show examples how you can run them from command line to debug specific problems discuss existing deployment models and options (VMs, Puppet, perfSONAR on $200 box, etc.) and new features of perfSONAR 3.5 Based on ESNet tutorials

9 GDB Review of the WG – focus on use cases and overall progress in different areas Define and understand slow transfers perfSONAR commissioning and deployment: support unit, follow up, debugging – Done perfSONAR central configuration and mesh management (and ESnet project) – Done Uniform way to access and integrate existing network measurements Define topology in a common way – proximity service - prototype available, further testing needed Common API – OSG Datastore, publishing results to MQ - Done Integration FTS performance study (Saul) ATLAS and LHCb perfSONAR pilot projects Coordinated response to the network performance issues (ATLAS) WLCG Network Throughput SU and underlying procedure - Done Baseline existing links (full mesh), help commission new ones Establishing WLCG-wide meshes (Done) Running core networking meshes (LHCOPN/LHCONE) to help debug links (Done)

10 Next steps Stable infrastructure (OSG production date is 13 of Oct)
Production stream Fix remaining issues with the OSG datastore Update central dashboard (psmad) – becomes the official production dashboard Update monitoring (few metrics no longer work with 3.5) Discuss and agree on what meshes we introduce (mesh leaders) Follow up on sonars with issues Start focusing on the various integration efforts: Resources need to come from the experiments Support will be shared between OSG and WLCG WG Next meetings: 4th Nov, 2nd Dec


Download ppt "Alerting/Notifications (MadAlert)"

Similar presentations


Ads by Google