Download presentation
Presentation is loading. Please wait.
Published byPamela Tyler Modified over 9 years ago
1
Global ADC Job Monitoring Laura Sargsyan (YerPhI)
2
30/11/2010ATLAS Software & Computing Workshop 2 Motivation Provide an overview of job processing in scope of ATLAS regardless of the submission tools and execution backends Adapt CMS Job monitoring for the analysis users, preserving the content but improving visualization.
3
30/11/2010ATLAS Software & Computing Workshop 3 Architecture of Dashboard Job monitoring Dashboard Job Repository (Dashboard DB) PanDA DB Instrumented GANGA UI Instrumented jobs MSG (ActiveMQ) Dashboard Collector Historical view UI Job summary UI Analysis Job Monitoring UI
4
30/11/2010ATLAS Software & Computing Workshop 4 Main components (1) Dashboard GANGA reporting: GANGA plugin, publishes master/subjobs statuses, meta information to MSG MSG: published and consumed messages Data collectors msg-consume2db: listens to messages from MSG PanDA collector: retrieves data from PanDA DB Watchdog scripts: scheduled procedures that send alarms by SMS, e-mail in case of problem DB scheduled services collector alarms cronjob scripts
5
30/11/2010ATLAS Software & Computing Workshop 5 Data repository: DB Triggers populate data from GANGA job reporting and PanDA DB into the database tables Web Application layer: responsible for the HTTP entry point to the data and exposes them in different formats (JSON, XML, CSV) User interfaces Provides user centric view; Main components (2)
6
30/11/2010ATLAS Software & Computing Workshop 6 Implementation Importing PanDA data into the schema, which is validated and tuned for monitoring purposes Instrumentation of GANGA jobs for MSG reporting, submitted via WMS, Local submission, CREAM CE Data from MSG is collected in the monitoring repository As a result all ATLAS job monitoring data, both for analyses and production is collected in common monitoring schema Setup aggregated procedures for data accounting Adapting CMS dashboard interactive and accounting user interfaces for ATLAS(adding sorting and filtering by cloud)
7
30/11/2010ATLAS Software & Computing Workshop 7 Job Summary (1) Interactive view : What is going on now regarding job processing in the scope of ATLAS Aimed at different types of users: individual scientists using the Grid for data analysis, user support teams, site admins, VO managers, managers of different computing projects. Job Summary enables very flexible access to recent monitoring data and shows the job processing of a VO at run-time
8
30/11/2010ATLAS Software & Computing Workshop 8 Job Summary (2) http://dashb-atlas-jobdev.cern.ch/dashboard/request.py/jobsummary
9
30/11/2010ATLAS Software & Computing Workshop 9 Analysis Job Monitoring (1) Collects and exposes a user-centric set of information to the user regarding submitted tasks. Focused on the user's perspective. Offers a wide selection of graphical plots. User-driven development. Provides a consistent way of following a user’s analysis jobs regardless of the submission tool. Detailed information on twiki: https://twiki.cern.ch/twiki/bin/view/ArdaGrid/TaskMonitoringWebUI Analysis Job Monitoring “ web interface will be presented today on ” Distributed Analysis “ session by Jakub MOSCICKI
10
30/11/2010ATLAS Software & Computing Workshop 10 Analysis Job monitoring (2) meta information http://dashb-atlas-jobdev.cern.ch/templates/client/index.html Includes Full bookmarking capability Working 'refresh' capability “Breadcrumbs” navigation element Easy search History support “time period” selection for from-till and time range selection
11
30/11/2010ATLAS Software & Computing Workshop 11 Analysis Job monitoring (3) Resubmission history Link to the PanDA monitoring page for each (panda) job
12
30/11/2010ATLAS Software & Computing Workshop 12 Historical views Functionality Number of terminated, submitted, pending, running jobs Distribution of failed jobs by failure codes/reasons/categories CPU/Wall clock consumption, efficiency as cpu versus wallclock Processed events : number of processed events as a function of time, CPU/wallclock time spent on a single event Resource utilization, number of used slots, efficiency of site usage compared to pledges Activities at the site. Single site view with job processing metrics. Data transfer distributions will be added soon. All data can be filtered by site, activity, cloud Any time range can be selected Available granularities are hourly/daily/weekly/monthly All data is available in machine-readable format All plots are available via direct link
13
30/11/2010ATLAS Software & Computing Workshop 13 Terminated, Submitted, Pending, Running Jobs Click to the appropriate button to create plot http://dashb-atlas-job-dev.cern.ch/dashboard/request.py/dailysummary Granularity: Hourly, Daily,Monthly 13 Historical views Time Range: 24 h, 48 h, week, month, custom Click on the plot To zoom in and out
14
30/11/2010ATLAS Software & Computing Workshop 14 Status of Terminated jobs Chosen parameters : All T1 +T0 Time Range 48 hours Granularity -hourly Chosen parameters : All T1 +T0 Time Range 48 hours Granularity -hourly Sorted by activities (production jobs) Click on the header of the plot to get links to machine- readable format or direct link to a plot Historical views
15
Failed/Aborted jobs. Error codes Chosen parameters : All T1 +T0 Time Range 48 hours Granularity -hourly 2 kinds of failure: ● application (transExitCode) GRID (pilot, brokerage, ddm, jobDispatcher, supervisor, execution, taskBuffer) Application failure should be grouped by component (e.g. site,user,application ) Historical views
16
30/11/2010ATLAS Software & Computing Workshop 16 Milestones and achievements Monitoring plugin for ATLAS Ganga users publishes information about jobs to MSG since 17/05/2010 All ATLAS job monitoring data is collected in the common schema in real time since 26/09/2010 Aggregation procedures for feeding summary db tables setup since 8/10/2010 Interactive and accounting UI are available for ATLAS Community from 11/10/2010 Historical views now contain data imported from PanDA archive started from 1/01/2010.
17
30/11/2010ATLAS Software & Computing Workshop 17 Job Monitoring data starting from 1/01/2010 Historical views
18
30/11/2010ATLAS Software & Computing Workshop 18
19
30/11/2010ATLAS Software & Computing Workshop 19 Plans for the next year Migrate DB the production server after validation by ATLAS (January 2011) Improve performance of the Interactive UI (January 2011) Add user interface for the production shifters (February-March 2011)
20
30/11/2010ATLAS Software & Computing Workshop 20 Effort and sustainability Developers: n Laura Sargsyan (ATLAS) n Julia Andreeva (IT) n Edward Karavakis (IT) n Lukasz Kokoszkiewicz (IT) n Jakub Moscicki (IT) All applications (apart of data collectors, analysis users' web interface) are shared with CMS. CERN IT ES provides support for these applications.
21
30/11/2010ATLAS Software & Computing Workshop 21 Waiting for your feedback E-mail: atlas-adc-monitoring@cern.ch arda-dashboard-dev@cern.ch
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.