Download presentation
Presentation is loading. Please wait.
Published byVivian Goodman Modified over 9 years ago
1
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 User-centric monitoring of the analysis and production activities within the ATLAS and CMS Virtual Organisations using the Experiment Dashboard system EGI Community Forum 2012 J. Andreeva, M. Cinquilli, I. Dzhunov, E. Karavakis (CERN & SA3), M. Kenyon, L. Kokoszkiewicz, P. Saiz, L. Sargsyan, D. Tuckett CERN IT-ES 3/27/2012 EGI Community Forum 2012 - Munich1
2
www.egi.eu EGI-InSPIRE RI-261323 Outline Importance and complexity of monitoring the LHC job processing activity Existing solutions for ATLAS & CMS VOs Experiment Dashboard Task Monitoring applications Common solutions for the ATLAS & CMS Future plans Summary 3/27/2012 User-centric monitoring using the Experiment Dashboard system 2
3
www.egi.eu EGI-InSPIRE RI-261323 Importance of monitoring the job processing activity WLCG integrates more than 140 computing centres in 35 countries Job processing is the core part of the VO computing activities More than 100,000 jobs are running concurrently for the LHC VOs using various middleware platforms, job submission methods and execution back-ends Scientists must be able to monitor without any hassle the execution status, application and grid-level messages of their tasks that may run at any site within the WLCG Only serious issues should be escalated to the support teams 3/27/2012 User-centric monitoring using the Experiment Dashboard system 3
4
www.egi.eu EGI-InSPIRE RI-261323 Complexity of monitoring the job processing activity More than 600K ATLAS jobs & 400K CMS jobs are submitted daily on different middleware platforms! Job processing activity is divided into two categories: User analysis Monte-Carlo production MC production is a well-organised activity performed by a group of experts User analysis is a chaotic activity performed by diverse members of the physics community Normally carried out by users who are not necessarily experienced in using the Grid - particular difficult to predict 3/27/2012 User-centric monitoring using the Experiment Dashboard system 4
5
www.egi.eu EGI-InSPIRE RI-261323 Existing solutions Most of the monitoring applications are coupled to VO-specific solutions CRAB Monitoring is coupled to jobs submitted by the CRAB submission system WMAgent Monitoring is coupled to jobs submitted via WMAgent Panda Monitoring is coupled to jobs submitted via the Panda submission system GangaMon / MiniDashboard is coupled to jobs submitted with Ganga 3/27/2012 User-centric monitoring using the Experiment Dashboard system 5 ATLAS CMS
6
www.egi.eu EGI-InSPIRE RI-261323 Experiment Dashboard Monitoring system developed for the LHC experiments Enables transparent view of the experiment activities across different middleware implementations and combines Grid monitoring data with information that is specific to the VO Loose coupling to information sources; collecting information from various information sources Job submission systems Jobs themselves Relies on instrumentation of the job submission frameworks and provides a common library for that purpose. Defines common set of attributes and format for reporting Presents this information in a coherent way as all of it came from one single source! 3/27/2012 User-centric monitoring using the Experiment Dashboard system 6
7
www.egi.eu EGI-InSPIRE RI-261323 Dashboard Task Monitoring applications The Dashboard Task Monitoring applications collect & expose to the user a user-centric set of info Provide a clean and precise view of the task evolution reason of failure resubmission history Based on common solutions and DB schema Developed in close collaboration with the physicists who use the Grid infrastructure and they are tailored to their needs Heavily used both within ATLAS & CMS for the production and analysis activities 3/27/2012 User-centric monitoring using the Experiment Dashboard system 7
8
www.egi.eu EGI-InSPIRE RI-261323 Job monitoring architecture 3/27/2012 User-centric monitoring using the Experiment Dashboard system 8 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running at the WNs Message server (MonALISA or MSG) Dashboard consumer User web interfaces Job submission client or server Dashboard web server
9
www.egi.eu EGI-InSPIRE RI-261323 Job monitoring architecture (cont.) 3/27/2012 User-centric monitoring using the Experiment Dashboard system 9 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running at the WNs Message server (MonALISA or MSG) Dashboard consumer User WEB interfaces Job submission client or server Dashboar d web server CMS information sources: CRAB jobs, clients and server, Prod Agent jobs and server, WMAgent jobs and server are instrumented for Dashboard reporting. Reporting is currently based on MonALISA
10
www.egi.eu EGI-InSPIRE RI-261323 Job monitoring architecture (cont.) 3/27/2012 User-centric monitoring using the Experiment Dashboard system 10 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running at the WNs Message server (MonAlisa or MSG) Dashboard consumer User WEB interfaces Job submission client or server Dashboard web server ATLAS information sources: Direct access to ATLAS Production DB and Panda DB. Ganga jobs submitted through WMS and local batch systems and Ganga clients are instrumented for Dashboard reporting. Reporting based on ActiveMQ (MSG) - can be used by any job submission framework PANDA DB ATLAS PROD DB
11
www.egi.eu EGI-InSPIRE RI-261323 Job monitoring architecture (cont.) 3/27/2012 User-centric monitoring using the Experiment Dashboard system 11 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running at the WNs Message server (MonALISA or MSG) Dashboard consumer User web interfaces Job submission client or server Dashboard web server The same data repository is used by multiple applications within a VO. Each of them is focused on a particular use case. Common solutions shared by the two VOs even when using different job submission systems and execution back-ends. UIs are database agnostic
12
www.egi.eu EGI-InSPIRE RI-261323 Job monitoring architecture (cont.) 3/27/2012 User-centric monitoring using the Experiment Dashboard system 12 Dashboard Data Repository (ORACLE) Data retrieval via APIs Jobs running At the WNs Message server (MonALISA or MSG) Dashboard consumer User WEB interfaces Job submission client or server Dashboard web server Dashboard information is consumed by other applications in machine-readable format: Local fabric monitoring Site Status Board GridMap SiteView WLCG Google Earth Dashboard CMS Data popularity Imperial College Real Time Monitoring
13
www.egi.eu EGI-InSPIRE RI-261323 CMS Analysis Task Monitoring Focused on the user's perspective Offers a wide selection of graphical plots User-driven development Heavily used by CMS – up to 305 daily users 3/27/2012 User-centric monitoring using the Experiment Dashboard system 13
14
www.egi.eu EGI-InSPIRE RI-261323 CMS Analysis Task Monitoring Focused on the user's perspective Offers a wide selection of graphical plots User-driven development Heavily used by CMS – up to 305 daily users 3/27/2012 User-centric monitoring using the Experiment Dashboard system 14 Users from 52 countries from 5 months stats!!!
15
www.egi.eu EGI-InSPIRE RI-261323 User / User-support perspective with a wide selection of plots Using web2.0 technologies and exposing a modern user interface Empowers users so that only non-trivial issues are escalated to support teams Analysis Task Monitoring User-centric monitoring using the Experiment Dashboard system 15 Panda states Task name resolved according to output container dataset name Graphical representation of the status of jobs Task filtering by pattern Task filtering by time period Powered by hBrowse Based on hBrowse, a common jQuery framework used for generic job monitoring applications (for more information please see the poster) 3/27/2012
16
www.egi.eu EGI-InSPIRE RI-261323 Analysis Task Monitoring 2/13/2016 User-centric monitoring using the Experiment Dashboard system 16 Task meta information Links to the panda page for more detailed information Advanced interactive plots. Can be exported as image or pdf document
17
www.egi.eu EGI-InSPIRE RI-261323 Analysis Task Monitoring on Android! Work performed by two Brunel University students: Parth Patel & Benjamin Taliadoros (under the supervision of Prof. Akram Khan) Download Link: dashboard.cern.ch/cms Installation Steps: 1) Download Application from above link, 2) Open downloaded file, 3) User must enable the ‘Untrusted Sources’ option from Settings to install 3/27/2012 User-centric monitoring using the Experiment Dashboard system 17 Tasks view Sort by: Task Name Date (ascending or descending Total # Jobs (ascending or descending)
18
www.egi.eu EGI-InSPIRE RI-261323 Analysis Task Monitoring on Android! 3/27/2012 User-centric monitoring using the Experiment Dashboard system 18 Jobs view
19
www.egi.eu EGI-InSPIRE RI-261323 Error Reporting Tool When a client submitted job fails, a user can upload a snapshot of the working directory for investigation by the Analysis Ops team Heavily used by the CMS Analysis Operations Service 3/27/2012 User-centric monitoring using the Experiment Dashboard system 19 Links to Task Monitoring Experts can download a snapshot of the working dir of the user Powered by hBrowse
20
www.egi.eu EGI-InSPIRE RI-261323 Production Task Monitoring 3/27/2012 User-centric monitoring using the Experiment Dashboard system 20 Allows users to follow the progress of production tasks Task-oriented view of production activity with a wide selection of stats&plots Easily detect inefficiencies and/or delays in executing production tasks Takes into account feedback collected from ATLAS production managers Powered by hBrowse
21
www.egi.eu EGI-InSPIRE RI-261323 Future Plans Dashboard job monitoring applications will be extended according to the requests of the LHC VOs Analysis Task Monitoring will support the resubmission and cancellation of a given task or job Production Task Monitoring will be extended according to the requests being collected from ATLAS production managers 3/27/2012 User-centric monitoring using the Experiment Dashboard system 21
22
www.egi.eu EGI-InSPIRE RI-261323 Creating your own Dashboard A tutorial that gives step-by-step instructions on how to create a personalised view (mashup) of Dashboard plots using the popular mashup tool Netvibes: https://twiki.cern.ch/twiki/bin/view/ArdaGrid/D ashboardMashup 3/27/2012 User-centric monitoring using the Experiment Dashboard system 22
23
www.egi.eu EGI-InSPIRE RI-261323 Summary The Experiment Dashboard Framework could be easily adapted to the needs of new VOs but the VOs must decide what they wish to monitor and implement/extend the monitoring system according to their needs Provides common solutions for job monitoring of the LHC experiments based on the instrumentation of the job submission frameworks. Common libraries for that purpose are provided Works transparently across different middleware platforms, submission methods and execution back-ends Targets different categories of users Heavily used by the ATLAS and CMS analysis and production community on a daily basis Responds well to the needs of the LHC experiments http://dashboard.cern.ch 3/27/2012 User-centric monitoring using the Experiment Dashboard system 23
24
www.egi.eu EGI-InSPIRE RI-261323 Backup Slide Guide on commonly used tools, libraries and coding style within the developers of the Experiment Dashboard project is available at https://twiki.cern.ch/twiki/bin/view/ArdaGrid/Libs 2/13/2016 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.