Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experiment Dashboard overviw of the applications

Similar presentations


Presentation on theme: "Experiment Dashboard overviw of the applications"— Presentation transcript:

1 Experiment Dashboard overviw of the applications
Julia Andreeva SDC-MI section meeting

2 Main areas of the monitoring activities
Monitoring of the computing activities of the LHC VOs . Monitoring of the sites and services from the VO perspective, i.e. evaluating status of sites and services based on metrics defined by the VOs Providing cross-VO global picture of the LHC activities on the WLCG infrastructure

3 Development strategy Whenever possible apply cross-VO, cross-middleware solutions. Even starting to develop an application on the request of a single experiment aim to make it generic and easily adapted for others. As a result monitoring solutions provided by Dashboard are shared between several LHC experiments and are not coupled with a particular service, middleware flavor or a particular implementation of the workload management or data management systems UIs are designed with strong involvement of the potential users All information is available in machine-readable format

4 ES monitoring activities
U Operation teams Sites \ Sites Data management monitoring Users General public Data transfer Data access Publicity & Dissemination Monitoring of the job processing WLCG Google Earth Dashboard Infrastructure monitoring Opera- tion teams Analysis Production Real time and historical views Operation teams Site Status Board Site usability SiteView Sites Sites

5 ES monitoring activities
U Operation teams Sites \ Sites Data management monitoring Users General public Data transfer Data access Publicity & Dissemination No overlap with any monitoring effort in IT apart of Site usability which is based on the results of SAM tests and correspondingly there is a synergy with MyWLCG portal Monitoring of the job processing WLCG Google Earth Dashboard Infrastructure monitoring Opera- tion teams Analysis Production Real time and historical views Operation teams Site Status Board Site usability SiteView Sites Sites

6 JOB monitoring (1) Shared by CMS and ATLAS though information sources are different. Experiment workflows are instrumented to report job monitoring information. In case of ATLAS data is regularly imported from Panda server. Multiple UI dedicated to various user categories and usecases running on top of the single data repositories Demanding applications from the point of view of scalability. ATLAS submits up to 1 million jobs daily, CMS submits K jobs daily. DB contains processing details for every job

7 JOB monitoring (2) No overlap with any monitoring effort in IT.
Interactive view (what is happening now): CMS ATLAS Used by members of various computing projects ( like production team, analysis support), VO managers, site administrators. In machine-readable format data is imported into Site Status Board, WLCG Google Earth Dashboard, SiteView, local fabric Uses raw , non aggregated data Historical view (accounting portal). Shows job processing metrics as function of time for any time ranges. Uses aggregated data Similar usage as for the interactive view. Weekly distributions of this application are reviewed during CMS facilities operations meetings, dataops meeting, Tier1 and Tier2 coordination meetings No overlap with any monitoring effort in IT.

8 JOB MONITORING (3) No overlap with any monitoring effort in IT.
Task monitoring, user–centric application for analysis users. CMS ATLAS two slightly different applications for analysis and production, customized for particular user category Production: Analysis: Task monitoring application is widely used in CMS ( distinct users daily ~75% of all analysis users). Target community : physicists running their analysis jobs on any execution backend (GRID or local farm) , analysis support team, sometime site administrators when they need to understand better what user is doing at their site. ATLAS task monitoring was recently introduced to the user community. New version of User Analysis Task Monitoring and Production Task Monitoring using a common framework (hbrowse) implemented in jQuery. Possibility not only to monitor , but so to handle user jobs via UI (killing jobs). No overlap with any monitoring effort in IT.

9 Data management monitoring (1)
Multiple applications , from the implementation point of view all of them have common core part (common DB schema, aggregation, UI) initially developed for ATLAS DDM Dashboard: - ATLAS DDM Dashboard - WLCG Transfer Dashboard - FAX and AAA Dashboards

10 ATLAS DDM Dashboard Heavily used by ATLAS computing community (up to 1500 unique visitors (IP addresses) per month, 10-20K pages are viewed daily) Used by members of data management team, ATLAS computing shifters. In machine-readable format data is imported into WLCG Google Earth, Site View, Site Status Board No overlap with any monitoring effort in IT.

11 WLCG Transfer Dashboard
Cross-VO and cross-technology monitoring system which provides global data transfer picture on the WLCG infrastructure Monitors data transfers performed by FTS and data traffic on the xrootd federated storage (ATLAS,ALICE and CMS) In production since June 2012 unique visitors daily Development process is ongoing, functionality is being extended No overlap with any monitoring effort in IT.

12 AAA and FAX dashboard Provides single entry point for all monitoring information about data traffic and data access on a given xrootd federation. Includes EOS data both for ATLAS and CMS. Information sources are the same as for the the WLCG transfer Dashboard, but AAA and FAX Dashboards provide much more detailed view in particular what concerns data access. WLCG Transfer Dashboard does not include EOS data. Used by people operating federations, site administrators, LHC VO computing teams Development process is ongoing, functionality is being extended No overlap with any monitoring effort in IT.

13 Monitoring of the infrastructure from the VO perspective (1)
Site Status Board Evaluates status of the distributed sites and services used by a particular VO from various perspectives. VOs are free to define monitoring metrics (status and numeric), time range for their update, their criticality and customized views. Shares a lot of implementation with Site View and Site usability applications. Snapshot and historical distributions are available. Some metrics are standard and built in the SSB (downtime info taking into account experiment topology, results of service types defined as critical by VO, whether site/service is visible in BDII…). Knowledge about experiment topology is built in the SBB schema. Enabled for all 4 LHC experiments. Actively used by CMS and ATLAS for the distributed computing shifts and site evaluation and commissioning.. Data in machine-readable format is imported in fabric monitoring, site readiness , site blacklisting for production and analysis , etc… Used by people taking part in the computing shifts, site administrators, VO managers, analysis support and analysis users in order to understand whether a particular site has a problem and has to be blacklisted, etc… CMS ATLAS No overlap with other monitoring effort in IT. Though the concept is similar to SLS which is used by experiments to monitor CERN central services . Visualization is different , scope is different and SSB data structure is driven by the VO topology

14 Monitoring of the infrastructure from the VO perspective (2)
Site usability monitor (SUM) Based on the results of SAM tests. The application provides estimates the quality of service running at the sites as it is evaluated by the VOsExperiments wanted to be able to test sites for different usecases and correspondingly to have multiple profiles for site evaluation (in terms of set of critical service types, set of critical metrics, etc…). The application is actively used by the LHC experiments for daily operations and site commissioning activity, namely by members of the computing projects, site administrators, VO managers. The distributions for Tier1 are considered among the key metrics at the daily WLCG meetings and are included in the weekly reports to the MB. Data in machine-readable format is imported to the local fabrics monitoring, Site Status Board, SiteView, CMS Site Readiness. When SAM infrastructure started to be redesigned, it was agreed with SAM that availability calculation won’t be any more implemented by the Dashboard extension in SAM and Dashboard won’t query involved DBs directly. Dashboard interface was preserved since experiments relied on it and it was integrated in the experiment-specific systems. Data for the Dashboard UI is retrieved from SAM via new SAM pi. There is an overlap with MyWLCG portal which has not yet been evaluated by the experiments

15 Various accounting portals
Several accounting portals with experiment-specific information sources were developed based on the Job monitoring historical view - ATLAS DDM accounting - PD2P monitoring - CMS Condor monitoring …. No overlap with any monitoring effort in IT.

16 Global view of the LHC activities on the WLCG infrastructure
WLCG Google Earth Dashboard Life ~real-time monitor of the job processing and data transfer of the LHC experiments performed on WLCG. Mostly for publicity and dissemination purposes. Uses experiment-specific monitoring systems (Phedex, Dirac, MonAlisa repository for ALICE) and Experiment Dashboard as information sources. Using data retrieved from these sources every 10 minutes Dashboard server generates input file for the Google Earth client. Runs at several WLCG computing centers including CERN (CC, Globus permanent exhibition, ATLAS computing room). Is being demonstrated at various conferences and public events. No overlap with any monitoring effort in IT.

17 Global view of the LHC activities on the WLCG infrastructure
SiteView Aims to provide the overall picture of the computing activities of the LHC VOs at a particular site. At the time being is not actively used. Because of the lack of manpower no sufficient effort was put into it in order to validate it both by experiments and site admins. Google Earth Dashboard is linked to it in order to provide a detailed picture of what is going on at a particular site: No overlap with any monitoring effort in IT.


Download ppt "Experiment Dashboard overviw of the applications"

Similar presentations


Ads by Google