CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services GS group meeting 07.03.08 Monitoring and Dashboards section Activity.

Slides:



Advertisements
Similar presentations
Experience In Developing Dynamic Web Interfaces: The Case Study of the ALICE Job Reliability Dashboard Eamonn Maguire IT-PSS 30-Aug
Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks VO-specific systems for the monitoring of.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
CERN IT Department CH-1211 Geneva 23 Switzerland t Open projects in Grid Monitoring IT-GS-MDS Section Meeting 25 th January 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overlook of Messaging.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Dashboard program of work Julia Andreeva GS Group meeting
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.
WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.
CERN IT Department CH-1211 Geneva 23 Switzerland t CCRC’08 Tools for measuring our progress CCRC’08 F2F 5 th February 2008 James Casey, IT-GS-MND.
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
Julia Andreeva on behalf of the MND section MND review.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI User-centric monitoring of the analysis and production activities within.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE Grid monitoring from the VO/User perspective. Dashboard for the LHC experiments Julia Andreeva CERN, IT/PSS.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz The future of AliEn.
DDM Central Catalogs and Central Database Pedro Salgado.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
ConTZole Tomáš Kubeš, 2010 atlas-tz-monitoring.cern.ch An Interactive ATLAS Tier-0 Monitoring.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
Daniele Bonacorsi Andrea Sciabà
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
Key Activities. MND sections
POW MND section.
FTS Monitoring Ricardo Rocha
New monitoring applications in the dashboard
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services GS group meeting Monitoring and Dashboards section Activity Overview

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring and Dashboards section Julia Andreeva James Casey Catalin Cirstoiu Benjamin Gaidioz Anastasia Ivanchenko Gerhild Maier Andrey Nechaevskiy Daniel Rodrigues Ricardo Rocha Pablo Saiz Irina Sidorova Alexander Uzhinskiy Sergey Belov 5 staff, 4 project associates, 1 Openlab fellow funded by EDS, 1 PhD student, 1 technical student, 1 visitor (collaboration with Dubna) Members

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Main directions of work Dashboard project Covering 4 LHC experiments, various areas of activities and monitoring aspects: job monitoring, data management monitoring, monitoring of sites and services Architecture of monitoring solutions for WLCG Coordination of collaboration with GridView and monitoring activity in OSG CCRC08 monitoring

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Julia Coordinating the Dashboard project Managing and support of the Dashboard for CMS Contributing to the development of the CMS Dashboard applications - Job monitoring - Site availability based on the results of the SAM tests - CMS MC production monitoring Redesign of the dashboard job monitoring application (schema, collectors, UI) for support of the pilot jobs Chairing System Analysis Working Group Coordinating development of the common application for monitoring of the LHC experiments workflows for CCRC08 and beyond Section Leader

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services James Architecture of monitoring solutions for WLCG Focus on: –Site Monitoring with Nagios –ActiveMQ messaging system as transport layer –APIs/Protocols to present the information to other tools Manage the Gridview collaboration for CERN –With Rajesh Kalmady from BARC Collaborate with OSG on interoperation of monitoring for WLCG CCRC’08 –ServiceMap –MoU reporting

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Dashboard Framework –Development of the common components –Configuration and logging, Database access, Command line tools, Messaging and RPC APIs, Web application, Agent Startup and management, … Dashboard Build –Python oriented, based on distutils –Enforces common procedures on developers module structure, package naming and versioning, no need for direct cvs interaction for tagging or branching –Gives back automatic generation of binary tools (like CLIs), documentation, deliverables –Multiple release branches with RPMs and tarballs, APT and YUM repositories Support of both –Within the dashboard team –Within the ATLAS DDM team, which uses the Dashboard Build and several framework components: configuration/logging, messaging, agent configurator Ricardo

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services ATLAS DDM Dashboard –Monitoring of the ATLAS Distributed Data Management system –Single entry point to get an overview of dataset subscriptions, transfer throughput, transfer and registration errors, site services health –But also detailed information regarding individual transfer attempts Coordination of ATLAS Monitoring activities –Tools like the Dashboards: DDM, Production system, Job and Task monitoring, Panda –Integration with other components like SAM, software installations, file consistency checks, WLCG monitoring Ricardo

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Benjamin dashboard activities: –job monitoring: reimplementation two years ago, maintenance for ATLAS, LHCb and ALICE (with Pablo Saiz). installation guide (installed and maintained outside CERN by VleMED in NIKHEF). –ATLAS dashboard: production monitoring: assistance to shifters, CLI, API, user's guide, tutorials, etc. –framework: level: guru couple of code contributions, also developer's guide and tutorials.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Pablo in the Dashboard One of the dashboards developers: –Grid reliability for the 4 LHC VOs Thanks to Eamonn!! –CMS Site status board –CMS Input Collections Monthly site efficiency report -Taken over from Massimo

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Pablo in ALICE Main developer of several AliEn components: –File & metadata catalogue –TaskQueue and JobAgent system –File Transfer Daemon Support for the previous components One of the ‘on-call grid experts’ (together with Patricia and Fabrizio)

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Pablo’s current activities Please, don’t hate me…. (please excuse me for not presenting this myself… )

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Daniel Rodrigues :: Openlab fellow funded by EDS. Objectives:  Validation and testing of a new grid messaging system for deployment within wlcg  Re-engineering of components within the WLCG Service to use the new messaging system Tasks Summary:  Testing of Apache ActiveMQ performance and features for usage as broker:  Integrating existing components for using the messaging system: Gridview gridftp logs + Dashboard.  Writing best practices for using a messaging system within the WLCG context for other developers. Presentation: Openlab, January 08: heGrid_v1.2.pps Daniel

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Catalin Development of the monitoring system for ALICE based on Monalisa Monalisa related support for Dashboards (ATLAS and CMS), including installation and support of the Monalisa servers and repositories Work on PhD “Optimization Framework for Data Intensive Applications in Large Scale Distributed Systems”, framework for optimization of data transfers based on Monalisa Unfortunately, Catalin is leaving us soon. GOOD LUCK!

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Irina Activities 2007 Implemented central repository for CMS MC production monitoring information Set up procedures for aggregation of monitoring data in summary tables used by the UI Develop the API for data retrieval in the XML format

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Irina Current activities and plans Take part in the redesign of the Dashboard schema for job monitoring. The new schema should support the pilot job submission which is more and more used by the LHC experiments Take part in the development of data collection for the Dashboard job monitoring

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Andrey and Alexander Operations Procedures Daily log – tracking of current problems and open issues Weekly Report – summary report for the Joint Operations Meeting Weekly Tier-0 – summary of issues noticed on the Castor Tier-0 service

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Andrey & Alexander Summary  Operating activities  DashBoard installation  new FTS SLC4 pilot has been installed  right now we are working under development of the new schema or schema-patch(supposably it will be separate from the FTS schema and will be installing like module) with DB part for the monitoring prototype  Next plans: test new pilot strat implementation of our Monitoring Tools in DashBoard

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Gerhild PhD Work  Automatic Detection of Error Sources of Failed Grid Jobs  reported exit codes ≠ description of error source  first: distinction between user’s fault and site’s fault  f Dashboard Database  ?  Patterns  Rules  Report, Web Page, Alert System, … Data  Data Mining  Additional Knowledge  Representation User, Site, Exit Code,…  Looking at the data,…  All jobs of user X fail,…  List of observations

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Gerhild Dashboard Work  Web Interfaces in production for CMS: Daily Job Summary: information about jobs visualized with plots −terminated, submitted, pending, running jobs −status of terminated jobs −failed jobs by reason (grid errors, application errors) −status of site load −parallel running jobs Task Monitoring: detailed information about a user’s tasks (also in production for ATLAS) SAM Test Result Visualization −latest test results, historical test results −site and service availability plots −test history  future work: maintenance, improvements

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Anastasia Current tasks: ● implement job summary plots using GraphTools library ● implement Dashboard home page Future tasks: ● take part in the development of the common system for monitoring of the workflows for the LHC experiments

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Sergey Job monitoring for Condor Goal is to obtain extra information about job submitted via Condor-G, even before job start Extended job information from Condor event log – The event of interest is job status change – It’s possible to get user-specified attributes from ClassAd Tool runs as a job on Condor submission host Data is prepared to be in accordance with other Dashboard job information Job information is sending to collection server (using messaging system or MonALISA)‏ Current task is to finalize the developments and to provide the tool for real tests on production sites