Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 User-centric monitoring of the analysis and production activities within.

Slides:



Advertisements
Similar presentations
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Advertisements

Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
Analysis demos from the experiments. Analysis demo session Introduction –General information and overview CMS demo (CRAB) –Georgia Karapostoli (Athens.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
Julia Andreeva. \ Monitoring of the job processing Analysis Production Real time and historical views Users Opera- tion teams Sites Data management monitoring.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services GS group meeting Monitoring and Dashboards section Activity.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks VO-specific systems for the monitoring of.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
1 1 Service Composition for LHC Computing Grid Monitoring Beob Kyun Kim e-Science Division, KISTI
Processing of the WLCG monitoring data using NoSQL J. Andreeva, A. Beche, S. Belov, I. Dzhunov, I. Kadochnikov, E. Karavakis, P. Saiz, J. Schovancova,
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
PanDA Monitor Development ATLAS S&C Workshop by V.Fine (BNL)
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Dashboard program of work Julia Andreeva GS Group meeting
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Automated Grid Monitoring for LHCb Experiment through HammerCloud Bradley Dice Valentina Mancinelli.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
Julia Andreeva on behalf of the MND section MND review.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
Conclusions on Monitoring CERN A. Read ADC Monitoring1.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Enabling Grids for E-sciencE Grid monitoring from the VO/User perspective. Dashboard for the LHC experiments Julia Andreeva CERN, IT/PSS.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
WLCG Transfers Dashboard A unified monitoring tool for heterogeneous data transfers. Alexandre Beche.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI UMD Roadmap Steven Newhouse 14/09/2010.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Seven things you should know about Ganga K. Harrison (University of Cambridge) Distributed Analysis Tutorial ATLAS Software & Computing Workshop, CERN,
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI VO Services Activities VO Services Activities NA3 F2F Meeting (3/03/2011)
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Overview for ENVRI Gergely Sipos, Malgorzata Krakowian EGI.eu
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Daniele Bonacorsi Andrea Sciabà
Key Activities. MND sections
POW MND section.
Experiment Dashboard overviw of the applications
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI User-centric monitoring of the analysis and production activities within the ATLAS and CMS Virtual Organisations using the Experiment Dashboard system EGI Community Forum 2012 J. Andreeva, M. Cinquilli, I. Dzhunov, E. Karavakis (CERN & SA3), M. Kenyon, L. Kokoszkiewicz, P. Saiz, L. Sargsyan, D. Tuckett CERN IT-ES 3/27/2012 EGI Community Forum Munich1

EGI-InSPIRE RI Outline Importance and complexity of monitoring the LHC job processing activity Existing solutions for ATLAS & CMS VOs Experiment Dashboard Task Monitoring applications Common solutions for the ATLAS & CMS Future plans Summary 3/27/2012 User-centric monitoring using the Experiment Dashboard system 2

EGI-InSPIRE RI Importance of monitoring the job processing activity WLCG integrates more than 140 computing centres in 35 countries Job processing is the core part of the VO computing activities More than 100,000 jobs are running concurrently for the LHC VOs using various middleware platforms, job submission methods and execution back-ends Scientists must be able to monitor without any hassle the execution status, application and grid-level messages of their tasks that may run at any site within the WLCG Only serious issues should be escalated to the support teams 3/27/2012 User-centric monitoring using the Experiment Dashboard system 3

EGI-InSPIRE RI Complexity of monitoring the job processing activity More than 600K ATLAS jobs & 400K CMS jobs are submitted daily on different middleware platforms! Job processing activity is divided into two categories: User analysis Monte-Carlo production MC production is a well-organised activity performed by a group of experts User analysis is a chaotic activity performed by diverse members of the physics community Normally carried out by users who are not necessarily experienced in using the Grid - particular difficult to predict 3/27/2012 User-centric monitoring using the Experiment Dashboard system 4

EGI-InSPIRE RI Existing solutions Most of the monitoring applications are coupled to VO-specific solutions CRAB Monitoring is coupled to jobs submitted by the CRAB submission system WMAgent Monitoring is coupled to jobs submitted via WMAgent Panda Monitoring is coupled to jobs submitted via the Panda submission system GangaMon / MiniDashboard is coupled to jobs submitted with Ganga 3/27/2012 User-centric monitoring using the Experiment Dashboard system 5 ATLAS CMS

EGI-InSPIRE RI Experiment Dashboard Monitoring system developed for the LHC experiments Enables transparent view of the experiment activities across different middleware implementations and combines Grid monitoring data with information that is specific to the VO Loose coupling to information sources; collecting information from various information sources Job submission systems Jobs themselves Relies on instrumentation of the job submission frameworks and provides a common library for that purpose. Defines common set of attributes and format for reporting Presents this information in a coherent way as all of it came from one single source! 3/27/2012 User-centric monitoring using the Experiment Dashboard system 6

EGI-InSPIRE RI Dashboard Task Monitoring applications The Dashboard Task Monitoring applications collect & expose to the user a user-centric set of info Provide a clean and precise view of the task evolution reason of failure resubmission history Based on common solutions and DB schema Developed in close collaboration with the physicists who use the Grid infrastructure and they are tailored to their needs Heavily used both within ATLAS & CMS for the production and analysis activities 3/27/2012 User-centric monitoring using the Experiment Dashboard system 7

EGI-InSPIRE RI Job monitoring architecture 3/27/2012 User-centric monitoring using the Experiment Dashboard system 8 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running at the WNs Message server (MonALISA or MSG) Dashboard consumer User web interfaces Job submission client or server Dashboard web server

EGI-InSPIRE RI Job monitoring architecture (cont.) 3/27/2012 User-centric monitoring using the Experiment Dashboard system 9 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running at the WNs Message server (MonALISA or MSG) Dashboard consumer User WEB interfaces Job submission client or server Dashboar d web server CMS information sources: CRAB jobs, clients and server, Prod Agent jobs and server, WMAgent jobs and server are instrumented for Dashboard reporting. Reporting is currently based on MonALISA

EGI-InSPIRE RI Job monitoring architecture (cont.) 3/27/2012 User-centric monitoring using the Experiment Dashboard system 10 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running at the WNs Message server (MonAlisa or MSG) Dashboard consumer User WEB interfaces Job submission client or server Dashboard web server ATLAS information sources: Direct access to ATLAS Production DB and Panda DB. Ganga jobs submitted through WMS and local batch systems and Ganga clients are instrumented for Dashboard reporting. Reporting based on ActiveMQ (MSG) - can be used by any job submission framework PANDA DB ATLAS PROD DB

EGI-InSPIRE RI Job monitoring architecture (cont.) 3/27/2012 User-centric monitoring using the Experiment Dashboard system 11 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running at the WNs Message server (MonALISA or MSG) Dashboard consumer User web interfaces Job submission client or server Dashboard web server The same data repository is used by multiple applications within a VO. Each of them is focused on a particular use case. Common solutions shared by the two VOs even when using different job submission systems and execution back-ends. UIs are database agnostic

EGI-InSPIRE RI Job monitoring architecture (cont.) 3/27/2012 User-centric monitoring using the Experiment Dashboard system 12 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running At the WNs Message server (MonALISA or MSG) Dashboard consumer User WEB interfaces Job submission client or server Dashboard web server Dashboard information is consumed by other applications in machine-readable format: Local fabric monitoring Site Status Board GridMap SiteView WLCG Google Earth Dashboard CMS Data popularity Imperial College Real Time Monitoring

EGI-InSPIRE RI CMS Analysis Task Monitoring Focused on the user's perspective Offers a wide selection of graphical plots User-driven development Heavily used by CMS – up to 305 daily users 3/27/2012 User-centric monitoring using the Experiment Dashboard system 13

EGI-InSPIRE RI CMS Analysis Task Monitoring Focused on the user's perspective Offers a wide selection of graphical plots User-driven development Heavily used by CMS – up to 305 daily users 3/27/2012 User-centric monitoring using the Experiment Dashboard system 14 Users from 52 countries from 5 months stats!!!

EGI-InSPIRE RI User / User-support perspective with a wide selection of plots Using web2.0 technologies and exposing a modern user interface Empowers users so that only non-trivial issues are escalated to support teams Analysis Task Monitoring User-centric monitoring using the Experiment Dashboard system 15 Panda states Task name resolved according to output container dataset name Graphical representation of the status of jobs Task filtering by pattern Task filtering by time period Powered by hBrowse Based on hBrowse, a common jQuery framework used for generic job monitoring applications (for more information please see the poster) 3/27/2012

EGI-InSPIRE RI Analysis Task Monitoring 2/13/2016 User-centric monitoring using the Experiment Dashboard system 16 Task meta information Links to the panda page for more detailed information Advanced interactive plots. Can be exported as image or pdf document

EGI-InSPIRE RI Analysis Task Monitoring on Android! Work performed by two Brunel University students: Parth Patel & Benjamin Taliadoros (under the supervision of Prof. Akram Khan) Download Link: dashboard.cern.ch/cms Installation Steps: 1) Download Application from above link, 2) Open downloaded file, 3) User must enable the ‘Untrusted Sources’ option from Settings to install 3/27/2012 User-centric monitoring using the Experiment Dashboard system 17 Tasks view Sort by: Task Name Date (ascending or descending Total # Jobs (ascending or descending)

EGI-InSPIRE RI Analysis Task Monitoring on Android! 3/27/2012 User-centric monitoring using the Experiment Dashboard system 18 Jobs view

EGI-InSPIRE RI Error Reporting Tool When a client submitted job fails, a user can upload a snapshot of the working directory for investigation by the Analysis Ops team Heavily used by the CMS Analysis Operations Service 3/27/2012 User-centric monitoring using the Experiment Dashboard system 19 Links to Task Monitoring Experts can download a snapshot of the working dir of the user Powered by hBrowse

EGI-InSPIRE RI Production Task Monitoring 3/27/2012 User-centric monitoring using the Experiment Dashboard system 20 Allows users to follow the progress of production tasks Task-oriented view of production activity with a wide selection of stats&plots Easily detect inefficiencies and/or delays in executing production tasks Takes into account feedback collected from ATLAS production managers Powered by hBrowse

EGI-InSPIRE RI Future Plans Dashboard job monitoring applications will be extended according to the requests of the LHC VOs Analysis Task Monitoring will support the resubmission and cancellation of a given task or job Production Task Monitoring will be extended according to the requests being collected from ATLAS production managers 3/27/2012 User-centric monitoring using the Experiment Dashboard system 21

EGI-InSPIRE RI Creating your own Dashboard A tutorial that gives step-by-step instructions on how to create a personalised view (mashup) of Dashboard plots using the popular mashup tool Netvibes: ashboardMashup 3/27/2012 User-centric monitoring using the Experiment Dashboard system 22

EGI-InSPIRE RI Summary The Experiment Dashboard Framework could be easily adapted to the needs of new VOs but the VOs must decide what they wish to monitor and implement/extend the monitoring system according to their needs Provides common solutions for job monitoring of the LHC experiments based on the instrumentation of the job submission frameworks. Common libraries for that purpose are provided Works transparently across different middleware platforms, submission methods and execution back-ends Targets different categories of users Heavily used by the ATLAS and CMS analysis and production community on a daily basis Responds well to the needs of the LHC experiments 3/27/2012 User-centric monitoring using the Experiment Dashboard system 23

EGI-InSPIRE RI Backup Slide Guide on commonly used tools, libraries and coding style within the developers of the Experiment Dashboard project is available at 2/13/