1 A lightweight Monitoring and Accounting system for LHCb DC'04 production V. Garonne R. Graciani Díaz J. J. Saborido Silva M. Sánchez García R. Vizcaya.

Slides:



Advertisements
Similar presentations
Easy DCR Development Control Regulation Online Building Permission System.
Advertisements

Copyright © 2011 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. SQL Workshop Day 4.
Evaluation of NoSQL databases for DIRAC monitoring and beyond
GridScape Ding Choon Hoong Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia WW Grid.
Crawler-Based Search Engine By: Bryan Chapman, Ryan Caplet, Morris Wright.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
ARCHIBUS Log On Instructions. Log Into ARCHIBUS Web Central Log In Screen 1.Open your Internet browser. 2.Enter the URL to view the ARCHIBUS Login Page.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Server-side Scripting Powering the webs favourite services.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
Log Files. eValid Log Files eValid validates your WebSite by recognizing and recording both successful and unsuccessful events. Detailed records are stored.
CORE 2: Information systems and Databases CENTRALISED AND DISTRIBUTED DATABASES.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Results of the LHCb experiment Data Challenge 2004 Joël Closier CERN / LHCb CHEP’ 04.
MS Access 2007 Management Information Systems 1. Overview 2  What is MS Access?  Access Terminology  Access Window  Database Window  Create New Database.
Touchstone Automation’s DART ™ (Data Analysis and Reporting Tool)
MC-Quiz: Chapter 10 - Database Management Discovering Computers 2010.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
AgINFRA science gateway for workflows and integrated services 07/02/2012 Robert Lovas MTA SZTAKI.
37 Copyright © 2007, Oracle. All rights reserved. Module 37: Executing Workflow Processes Siebel 8.0 Essentials.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
ATLAS Production System Monitoring John Kennedy LMU München CHEP 07 Victoria BC 06/09/2007.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
DIRAC Review (12 th December 2005)Stuart K. Paterson1 DIRAC Review Workload Management System.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Module 6: Administering Reporting Services. Overview Server Administration Performance and Reliability Monitoring Database Administration Security Administration.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
Analysis of job submissions through the EGEE Grid Overview The Grid as an environment for large scale job execution is now moving beyond the prototyping.
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 10 Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi
ConTZole Tomáš Kubeš, 2010 atlas-tz-monitoring.cern.ch An Interactive ATLAS Tier-0 Monitoring.
SQL Advanced Monitoring Using DMV, Extended Events and Service Broker Javier Villegas – DBA | MCP | MCTS.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
Flood Map Library MD. M. HAQUE DWR-HYDROLOGY. Building a Flood Map Library Indexing existing flood maps and geospatial data for search and retrieval Separate.
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
gLite Information System
Glasgow, SQL Server Meetup
Introduction to Visual Basic 2008 Programming
Chapter Ten Managing a Database.
Created by Kamila zhakupova
gLite Information System
New developments on the LHCb Bookkeeping
DB2.
Initial job submission and monitoring efforts with JClarens
Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions
ITAS Risk Reporting Integration to an ERP
Overview of HEC Data Storage System (HEC DSS)
Presentation transcript:

1 A lightweight Monitoring and Accounting system for LHCb DC'04 production V. Garonne R. Graciani Díaz J. J. Saborido Silva M. Sánchez García R. Vizcaya Carrillo

2 Outline Manifesto Monitoring Web interface Internals Accounting Web interface Internals Outlook URLs

3 Manifesto Monitoring and Accounting are tasks in DIRAC 377 DIRAC is a Production grid for LHCb The Monitoring reports the status of jobs while in the WMS (Workload Management System) 366 Instantaneous snapshot of the system No historic records The Accounting records the status of jobs after leaving the WMS Provides historic record, accumulated statistics and evolution of recorded variables with time Main users: production and site managers

4 Design choices Monitoring Job information stored centrally in the WMS Info Provided directly by the job and the WMS push Passive services: no push ing of information No need for a common consumer API Job and Application state stored together Accounting Separate infrastructure from the monitoring Jobs can never be on the Accounting and the Monitoring Domain specific: LHCb production jobs

5 Information Flow WMS Web interface Job Database Accounting Database Cleaner Agent Accounting WriteRead Monitoring ReadWrite Job Users Backend Services & Agents Job Heart-beat DIRAC

6 Monitoring Web Interface 1 Interface to query monitoring service JobId popup a window with job details if clicked

7 Monitoring Web Interface 2 The overview shows predefined plots on the production Generated every few minutes PyChart PyChart used as graphics engine 100% python Supports SVG Running jobs by site

8 Monitoring Web Interface 3 Job status by site and production id

9 Monitoring Internals It consists of a XML-RPC service exposing whatever parameters are known to DIRAC Job parameters stored internally by DIRAC Primary parameters Execution site, job status, job owner etc. Fixed, centrally defined: fast access Can query on them Secondary parameters Number of steps, internal job state, etc Defined by the production job itself Stored as key-value pairs Slower access. Cannot query on them

10 JMS basic API example from xmlrpclib import ServerProxy server = ServerProxy(monitoring_url) #Retrieve list of jobs verifying some conditions conditions = {'Status': 'running', 'Site': 'DIRAC.CERN.ch' } jobreq = server.getJobs(conditions) #Print some parameters for each job if jobreq['Status']: for jobid in jobreq['Value']: print server.getJobSite(jobid) print server.getJobParameter(jobid, 'LocalBatchId') #Bulk operations sum = server.getJobsPrimarySummary(jobreq['Value']) ~3 s to select 95 out of 50k jobs ~0.7 s ~40 s

11 Accounting Web Interface 1 GUI for querying the Accounting Shows results As graphics As table As Excel sheet Several types of report Only a few shown here

12 Accounting Web Interface 2 Used resources by site

13 Accounting Web Interface 3 Used resources by event type Mb/job CPU/job Failed jobs CPU vs. Exec time Input and Output data vs. CPU

14 Accounting Web Interface 4 Produced data by production ID Rates Cumulative Number of events Gb of output

15 Accounting Web Interface 5 WMS statistics on DIRAC's performance Plots Job execution time vs. WMS waiting time Job execution time vs. WMS matching time Granularity Per site Per production Integral Allows assessment of DIRAC's performance

16 Accounting Internals Job and DIRAC statistics kept in a database Site contribution Data produced and used by jobs and steps Timing for jobs, steps and DIRAC internals Separate XML-RPC interfaces to populate and query the accounting tables Both interfaces have restricted access Jobs are moved to the accounting system by a cleaner agent after being validated

17 Accounting Usage About 10 hits per day Time to generate daily static reports: 8 min 60-70% of the time querying the database 30-40% of the time in the drawing package Server load<0.2 Total: 169 kjobs

18 Outlook Monitoring page Transactions in monitoring updates Further optimisation (bulk operations...) Search for a faster rendering package Make the web page dynamic: Less reloads Accounting New report types Normalized CPU Contribution by country Rate by site, country etc...

19 URLs Monitoring page Mirror on: Direct link to overview pages Accounting page