CERN IT Department CH-1211 Geneva 23 Switzerland t Open projects in Grid Monitoring IT-GS-MDS Section Meeting 25 th January 2008
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services WLCG Monitoring Working Groups 3 groups proposed by Ian Bird to the LCG Management Board, Oct 06. –Goal to improve the reliability of the WLCG grid 2 Grid Services Grid sensors Transport Repositories Views ……. Grid Services Grid sensors Transport Repositories Views ……. System Management Fabric management Best Practices Security ……. System Analysis Application monitoring ……
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Reliability is our reason Our goal is to improve the reliability of the Grid WLCG availability level for a Tier-2 is 95% –Greater for Tier-0 & Tier-1s What do we need to do ? –Detect problems before users do ! –Reduce time to respond to problems Approach is to put the monitoring and alarms close to the site administrators
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services High-level Model See WLCG_Monitoring_for_Managers.pdf for detailshttps://twiki.cern.ch/twiki/pub/LCG/GridServiceMonitoringInfo/0702- WLCG_Monitoring_for_Managers.pdf 4 LEMON Nagios SAM R-GMA SAME-WS GridView Experiment Dashboard GridIce HTTP LDAP GOCDB Dashboard GridView
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Projects Nagios Site Monitoring (Emir) –NCG rewrite, local tests on service (Emir) –Improved Publishers (Pranabesh) –Yaim for Nagios Messaging Infrastucture (James, Daniel) OSG-SAM Integration using Messaging (Piotr, Arvind Gopu, Rob Quick) GridMap (Max) –Dashboard Integration GridView (4xBARC) –Including quattorization –Gridview using Messaging for producers (GridView) Presentation title - 5
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Projects SLS/Nagios Integration (Joanna ASGC, Sebastien Lopienski) APEL using Messaging (STFC, Piotr) RDF Schema for monitoring (Piotr, … ) New SAM Portal (IT-GS, …) – (using CMS SAM Work?) Management Dashboard (John Shade, …) LEMON site monitoring (James) GOCDB as Topology Database (STFC effort, 1 BARC from Feb'08) "Service Cards" (Oliver Keeble, 1 BARC from Feb'08) Presentation title - 6
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services CCRC Reporting requirements Presentation title - 7
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Measuring according to MoU WLCG MoU is what sites have agreed to –But we don’t measure it right now! Presentation title - 8
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services CCRC’08 GridMap Presentation title - 9 Combines Production Status of service with availability –And dashboard metrics (in a 3rd map)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Summary We’re involved in many projects –Most of the effort is external –CERN does architecture, project management, coordination Main areas –Nagios site monitoring –Messaging for monitoring –SAM/GridView futures –CCRC’08 and WLCG operational monitoring 10