LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

Slides:

Advertisements

Similar presentations

WLCG Monitoring Consolidation NEC`2013, Varna Julia Andreeva CERN IT-SDC.

Advertisements

Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

Input from CMS Nicolò Magini Andrea Sciabà IT/SDC 5 July 2013.

ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.

CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.

Julia Andreeva. \ Monitoring of the job processing Analysis Production Real time and historical views Users Opera- tion teams Sites Data management monitoring.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services GS group meeting Monitoring and Dashboards section Activity.

Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks VO-specific systems for the monitoring of.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.

CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.

Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.

Dashboard program of work Julia Andreeva GS Group meeting

Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

ATLAS Production System Monitoring John Kennedy LMU München CHEP 07 Victoria BC 06/09/2007.

WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.

Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

ATLAS Dashboard Recent Developments Ricardo Rocha.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,

CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.

ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.

Julia Andreeva on behalf of the MND section MND review.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.

Conclusions on Monitoring CERN A. Read ADC Monitoring1.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI User-centric monitoring of the analysis and production activities within.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.

CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.

MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.

Global ADC Job Monitoring Laura Sargsyan (YerPhI).

FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.

GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,

Enabling Grids for E-sciencE Grid monitoring from the VO/User perspective. Dashboard for the LHC experiments Julia Andreeva CERN, IT/PSS.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.

New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.

WLCG Transfers Dashboard A unified monitoring tool for heterogeneous data transfers. Alexandre Beche.

CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,

The GridPP DIRAC project DIRAC for non-LHC communities.

ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.

CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.

SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.

MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.

SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.

Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.

WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.

Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April

Daniele Bonacorsi Andrea Sciabà

Key Activities. MND sections

POW MND section.

Experiment Dashboard overviw of the applications

Monitoring Of XRootD Federation

Monitoring of the infrastructure from the VO perspective

Presentation transcript:

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service monitoring from the VO perspective Thanks to Julia Andreeva and E. Karavakis for the slides

Dashboard for Monitoring the Computing Activities of the LHC Analysis + Production Real time and Historical Views Data transfer Data access Site Status Board Site usability SiteView WLCG GoogleEarth Dashboard 14/04/ Monitoring of the LHC computing activities during the first year of data taking

Common Solutions ApplicationATLASCMSLHCbALICE Job monitoring (multiple applications) Site Status Board Site Usability Monitoring DDM Monitoring global transfer monitoring system (en projet 2011) SiteView & GoogleEarth 14/04/2011 3

Job Monitoring Aimed at different types of users: individual scientists using the Grid for data analysis, user support teams, site admins, VO managers, managers of different computing projects Works transparently across different middleware, submission methods and execution backends 14/04/ Monitoring of the LHC computing activities during the first year of data taking

Job monitoring During 2010, Dashboard job monitoring for ATLAS was completely redesigned. Most of applications are shared with CMS. The shared components are data schema of the data repositories and user interfaces. Information sources are different => collectors are different as well. In case of CMS, CMS job submission tools (servers and job themselves) are instrumented to report job status information to Dashboard. In case of ATLAS, Dashboard is integrated with PANDA job monitoring DB. The Dashboard collector retrieves data from the PANDA DB every 5 minutes. Jobs submitted via Ganga through WMS or to local batch systems are instrumented to report their status via Messaging System for the Grid (MSG) based on ActiveMQ.

The following applications were enabled for ATLAS: Interactive view Historical view Task monitoring (first prototype) CMS job monitoring was extended in order to collect file access information which is used by the Data Popularity service. During next half of the year: New version of the Historical view will be enabled for CMS Continue effort aimed to improve performance, both for data collectors and Uis Development of the new version of ATLAS task monitoring for the analysis users with the possibility to resubmit/kill jobs via the monitoring UI Job monitoring

Task Monitoring Distribution by Site Detailed Job Information Distribution by Status Processed Events over Time Failure Diagnostics for Grid and Application Failures Efficiency Distributed by Site 14/04/ User / User-support perspective Wide selection of plots CMS & ATLAS >350 CMS users daily Monitoring of the LHC computing activities during the first year of data taking

Job Summary & Historical Views 14/04/ Job Summary Shifter, Expert, Site perspective Real time job metrics by site, activity, … Historical Views Site, Management perspective Job metrics as a function of time Monitoring of the LHC computing activities during the first year of data taking

Data transfer monitoring New version of the ATLAS Distributed Data Management monitoring (ATLAS DDM Dashboard). Improved visualization (matrix) which allows to monitor data transfer by source or by destination and to spot easier any tranfer problems. The first prototype was released in May and is already in use by the ATLAS community: First feedback is very positive.

During second half of 2011, in collaboration with the GT group of the CERN IT department will start to develop the global transfer monitoring system. The distributed FTS instances will be instrumented for reporting of the data transfer events via MSG. Dashboard collector will consume these events and record them into the central data repository, generate overall transfer statistics and expose this info to the user community via UIs and APIs. Most of ATLAS DDM Dashboard code should be re-used for the new data transfer monitoring system. More details can be found at: oring The detailed roadmap for this project is not yet defined. oring Data transfer monitoring

Site/service monitoring Include following applications: Site Usability (based on the results of SAM tests) Site Status Board WLCG Google Earth Dashboard

Site Usability Monitoring (SUM) During 2010 and beginning of 2011 SAM framework was completely redesigned and new version is based on Nagios. The LHC VOs started to submit remote tests via Nagios. The Dashboard Site Usability application is being redesigned to be compatible with the new SAM architecture. The first prototype was deployed on the validation server in April and should be validated by the LHC VOs. New SUM should be deployed to production by the end of summer 2011.

SUM Snapshots 14/04/ Monitoring of the LHC computing activities during the first year of data taking

Site Status Board (SSB) During second half of 2010 a lot of improvements were implemented for SSB: New version of the collectors which allowed to solve the DB locking problem and to provide the necessary level of performance were deployed in production (February 2011) New version of the UI with improved performance and extended functionality were deployed in production (Spring 2011) Both ATLAS and CMS are using SSB for the computing shifts and site commissioning activity Further development will follow the needs and requests of the LHC VOs

SSB Snapshot Maintenance Easy to identify sites with problems Grouped sites 14/04/ Monitoring of the LHC computing activities during the first year of data taking

WLCG Google Earth Dashboard GoogleEarth Dashboard is integrated with all VO-specific monitoring systems, including Dirac and MonAlisa, so it shows activities for all 4 experiments Recent development was focussed on the improvement of the robustness and reliability of the application.