Presentation is loading. Please wait.

Presentation is loading. Please wait.

Global ADC Job Monitoring Laura Sargsyan (YerPhI).

Similar presentations


Presentation on theme: "Global ADC Job Monitoring Laura Sargsyan (YerPhI)."— Presentation transcript:

1 Global ADC Job Monitoring Laura Sargsyan (YerPhI)

2 30/11/2010ATLAS Software & Computing Workshop 2 Motivation  Provide an overview of job processing in scope of ATLAS regardless of the submission tools and execution backends  Adapt CMS Job monitoring for the analysis users, preserving the content but improving visualization.

3 30/11/2010ATLAS Software & Computing Workshop 3 Architecture of Dashboard Job monitoring Dashboard Job Repository (Dashboard DB) PanDA DB Instrumented GANGA UI Instrumented jobs MSG (ActiveMQ) Dashboard Collector Historical view UI Job summary UI Analysis Job Monitoring UI

4 30/11/2010ATLAS Software & Computing Workshop 4 Main components (1)  Dashboard GANGA reporting: GANGA plugin, publishes master/subjobs statuses, meta information to MSG  MSG: published and consumed messages  Data collectors  msg-consume2db: listens to messages from MSG  PanDA collector: retrieves data from PanDA DB  Watchdog scripts: scheduled procedures that send alarms by SMS, e-mail in case of problem  DB scheduled services  collector alarms  cronjob scripts

5 30/11/2010ATLAS Software & Computing Workshop 5  Data repository:  DB Triggers populate data from GANGA job reporting and PanDA DB into the database tables  Web Application layer: responsible for the HTTP entry point to the data and exposes them in different formats (JSON, XML, CSV)  User interfaces  Provides user centric view; Main components (2)

6 30/11/2010ATLAS Software & Computing Workshop 6 Implementation Importing PanDA data into the schema, which is validated and tuned for monitoring purposes Instrumentation of GANGA jobs for MSG reporting, submitted via WMS, Local submission, CREAM CE Data from MSG is collected in the monitoring repository As a result all ATLAS job monitoring data, both for analyses and production is collected in common monitoring schema Setup aggregated procedures for data accounting Adapting CMS dashboard interactive and accounting user interfaces for ATLAS(adding sorting and filtering by cloud)

7 30/11/2010ATLAS Software & Computing Workshop 7 Job Summary (1)  Interactive view : What is going on now regarding job processing in the scope of ATLAS  Aimed at different types of users: individual scientists using the Grid for data analysis, user support teams, site admins, VO managers, managers of different computing projects.  Job Summary enables very flexible access to recent monitoring data and shows the job processing of a VO at run-time

8 30/11/2010ATLAS Software & Computing Workshop 8 Job Summary (2) http://dashb-atlas-jobdev.cern.ch/dashboard/request.py/jobsummary

9 30/11/2010ATLAS Software & Computing Workshop 9 Analysis Job Monitoring (1)  Collects and exposes a user-centric set of information to the user regarding submitted tasks.  Focused on the user's perspective.  Offers a wide selection of graphical plots.  User-driven development.  Provides a consistent way of following a user’s analysis jobs regardless of the submission tool. Detailed information on twiki: https://twiki.cern.ch/twiki/bin/view/ArdaGrid/TaskMonitoringWebUI Analysis Job Monitoring “ web interface will be presented today on ” Distributed Analysis “ session by Jakub MOSCICKI

10 30/11/2010ATLAS Software & Computing Workshop 10 Analysis Job monitoring (2) meta information http://dashb-atlas-jobdev.cern.ch/templates/client/index.html Includes Full bookmarking capability Working 'refresh' capability “Breadcrumbs” navigation element Easy search History support “time period” selection for from-till and time range selection

11 30/11/2010ATLAS Software & Computing Workshop 11 Analysis Job monitoring (3) Resubmission history Link to the PanDA monitoring page for each (panda) job

12 30/11/2010ATLAS Software & Computing Workshop 12 Historical views Functionality  Number of terminated, submitted, pending, running jobs  Distribution of failed jobs by failure codes/reasons/categories  CPU/Wall clock consumption, efficiency as cpu versus wallclock  Processed events : number of processed events as a function of time, CPU/wallclock time spent on a single event  Resource utilization, number of used slots, efficiency of site usage compared to pledges  Activities at the site. Single site view with job processing metrics. Data transfer distributions will be added soon.  All data can be filtered by site, activity, cloud  Any time range can be selected  Available granularities are hourly/daily/weekly/monthly  All data is available in machine-readable format  All plots are available via direct link

13 30/11/2010ATLAS Software & Computing Workshop 13 Terminated, Submitted, Pending, Running Jobs Click to the appropriate button to create plot http://dashb-atlas-job-dev.cern.ch/dashboard/request.py/dailysummary Granularity: Hourly, Daily,Monthly 13 Historical views Time Range: 24 h, 48 h, week, month, custom Click on the plot To zoom in and out

14 30/11/2010ATLAS Software & Computing Workshop 14 Status of Terminated jobs Chosen parameters : All T1 +T0 Time Range 48 hours Granularity -hourly Chosen parameters : All T1 +T0 Time Range 48 hours Granularity -hourly Sorted by activities (production jobs) Click on the header of the plot to get links to machine- readable format or direct link to a plot Historical views

15 Failed/Aborted jobs. Error codes Chosen parameters : All T1 +T0 Time Range 48 hours Granularity -hourly 2 kinds of failure: ● application (transExitCode) GRID (pilot, brokerage, ddm, jobDispatcher, supervisor, execution, taskBuffer) Application failure should be grouped by component (e.g. site,user,application ) Historical views

16 30/11/2010ATLAS Software & Computing Workshop 16 Milestones and achievements  Monitoring plugin for ATLAS Ganga users publishes information about jobs to MSG since 17/05/2010  All ATLAS job monitoring data is collected in the common schema in real time since 26/09/2010  Aggregation procedures for feeding summary db tables setup since 8/10/2010  Interactive and accounting UI are available for ATLAS Community from 11/10/2010  Historical views now contain data imported from PanDA archive started from 1/01/2010.

17 30/11/2010ATLAS Software & Computing Workshop 17 Job Monitoring data starting from 1/01/2010 Historical views

18 30/11/2010ATLAS Software & Computing Workshop 18

19 30/11/2010ATLAS Software & Computing Workshop 19 Plans for the next year  Migrate DB the production server after validation by ATLAS (January 2011)  Improve performance of the Interactive UI (January 2011)  Add user interface for the production shifters (February-March 2011)

20 30/11/2010ATLAS Software & Computing Workshop 20 Effort and sustainability Developers: n Laura Sargsyan (ATLAS) n Julia Andreeva (IT) n Edward Karavakis (IT) n Lukasz Kokoszkiewicz (IT) n Jakub Moscicki (IT) All applications (apart of data collectors, analysis users' web interface) are shared with CMS. CERN IT ES provides support for these applications.

21 30/11/2010ATLAS Software & Computing Workshop 21 Waiting for your feedback E-mail: atlas-adc-monitoring@cern.ch arda-dashboard-dev@cern.ch


Download ppt "Global ADC Job Monitoring Laura Sargsyan (YerPhI)."

Similar presentations


Ads by Google