Conclusions on Monitoring 03.12.10 - CERN A. Read ADC Monitoring1.

Slides:



Advertisements
Similar presentations
Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.
Advertisements

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
REVIEW OF NA61 SOFTWRE UPGRADE PROPOSAL. Mandate The NA61 experiment is contemplating to rewrite its fortran software in modern technology and are requesting.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
PanDA Summary Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
PanDA Monitor Development ATLAS S&C Workshop by V.Fine (BNL)
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Experiment Support ANALYSIS FUNCTIONAL AND STRESS TESTING Dan van der Ster, CERN IT-ES-DAS for the HC team: Johannes Elmsheuser, Federica Legger, Mario.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stephen Childs Trinity College Dublin &
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
08-Nov Database TEG workshop, Nov 2011 ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements Gancho Dimitrov.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
WLCG infrastructure monitoring proposal Pablo Saiz IT/SDC/MI 16 th August 2013.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ATLAS Production System Monitoring John Kennedy LMU München CHEP 07 Victoria BC 06/09/2007.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
ATLAS Dashboard Recent Developments Ricardo Rocha.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
Julia Andreeva on behalf of the MND section MND review.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI User-centric monitoring of the analysis and production activities within.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Shifters Jamboree Kaushik De ADC Jamboree, CERN December 4, 2014.
Network integration with PanDA Artem Petrosyan PanDA UTA,
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
WLCG Transfers Dashboard A unified monitoring tool for heterogeneous data transfers. Alexandre Beche.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
ConTZole Tomáš Kubeš, 2010 atlas-tz-monitoring.cern.ch An Interactive ATLAS Tier-0 Monitoring.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
Production System 2 manpower and funding issues Alexei Klimentov Brookhaven National Laboratory Aug 19, 2013 Production System Technical Meeting CERN.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Daniele Bonacorsi Andrea Sciabà
WLCG Workshop 2017 [Manchester] Operations Session Summary
Update on CERN IT Unified Monitoring Architecture (UMA)
John Gordon STFC OMB 26 July 2011
Key Activities. MND sections
POW MND section.
Experiment Dashboard overviw of the applications
The ADC Operations Story
Data Management cluster summary
Cloud Computing R&D Proposal
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

Conclusions on Monitoring CERN A. Read ADC Monitoring1

Outline Workshop overview Summaries of talks Conclusions

ADC Monitoring Workshop ½-day presentations, ½-day discussions Vision: Increase rate of improvement, consider all options Goals: – Consolidate current efforts – Consider possible rationalizations – Decide on priorities Presentations to: – Gain insight, inspiration from alternative projects (STAR/RHIC, Tier-0) – Sound out our activities, chart technology choices – Consider proposals (SSB in production, EGG interface to ADC) – Be informed about new challenges (Tier-3)

ADC Monitoring Workshop Summary Pandjango: – Panda classic monitor has served ATLAS well but technology is obsolete – Pursue full exploitation of ”cocktail”: Django interface to Panda DB, serving data in JSON format to jQuery client – Needs more effort (e.g. 0.8->1.8 FTE) Separating business logic from old presentation Implementing new presentation in jQuery etc – Many ”applications” to re-implement

ADC Monitoring Workshop Summary Lessons from STAR: – Single infrastructure/framework a beneifit – DB organization and schema essential to performance and functionality – Don’t be afraid to lose a little information to gain rapid feedback Tier-0: – Dashboard on server side, rich jQuery client, data served in JSON (very close to ADC Monitoring ”technology cocktail”) – Elegant, flexible presentation of Tier-0 status Overview, tasks and jobs, datasets all with same look and feel Shifters can dyamically construct their own monitoring pages – Statistics aggregation – Dynamic charts and plots of up to 11 variables, any time period

ADC Monitoring Workshop Summary Global ADC job monitoring – Adapted from CMS user job monitoring, supported by IT, based on Dashboard DB service – Imports from Panda DB and ActiveMQ messages from instrumented jobs (e.g. Ganga/WMS) – Job schema is subset of Panda plus aggregates for history: optimized for monitoring – Impressive recent progress for user-centric monitoring and especially historical views (accounting) – Will add views for ADCoS shifts (prodsys-oriented) Q

ADC Monitoring Workshop Summary DDM Dashboard 2.0 – Uses similar cocktail to ADC GJM – Prototype with long-awaited view by source (source/dest matrix) produced in short time, new cocktail praised EGG – Proposal for coherent view of ADC – Elegant solution enabling to probe any correlation – Would require additional development and optimization effort both on core software and ADC component backends as well as integration in a presentation scheme Tier-3 – Re-use as much existing monitoring SW as possible Xrootd a new element – Avoid impact on T1/T2-ops – CERN IT and Dubna working on plan – Effort more tightly coupled to Tier-3 working group than Monitoring activity Still defining Tier-3 types, collecting requirements

ADC Monitoring Workshop Summary AGIS – API is in production SSB – Unique aggregation of status of ADC services – Intended to deliver state of Site Exclusion Policy – Interesting for shifters, sites, potentially Tier-3’s as well – Good feedback from Italian shifters – Some development needed (e.g. cloud view, spacetoken granularity) – Effort is small fraction of several people, additional manpower requested – Request for ADC to put SSB in production

4/5 projects Dashboard-based All serve JSON data, present w/rich jQuery clients

Conclusions Propose to retire Prodsys Dashboard in few months time – ProdSys Task info in Classic Panda Monitor – Prodsys views (especially task-oriented) to be reproduced in ADC Global Job Monitor – Small working groups already identified Propose that expert ADCoS shifters and UK site admins evaluate SSB and report at ADC Retreat Propose NOT to embark on EGG-integration Propose to continue very promising development on DDM Dashboard 2 Color and abbrev. scheme for clouds, tiers datatypes re- proposed by Graeme, let’s converge and approve it

Conclusions Propose to pursue tighter integration of Panda Monitor migration and Dashboard-based ADC Global Job Monitor – Use same technology and infrastructure for database backend Both Django and Dashboard backends serve data in JSON format and presentation clients are jQuery-based Dashboard infrastructure supported by CERN-IT – Investigate reduction of number of databases (Panda is authoritative source) – Insure that Panda-specific information is not lost in new schema – Small working group identified, will report early Feb 2011 – Clients are decoupled in both schemes, plenty of Panda client ”applications” to be ported – Requirement that Final version will not throw user back to Panda Classic Monitor!

Other news Very interesting Job Execution Monitoring (remote multilevel debugging of grid jobs in situ) presented in weekly – Scaling issue of ActiveMQ and >= 0.5M jobs/day to be tested ”offline” – Some integration with Panda server and pilots needed, discussions started exists but unused – will set up top-level entry to ADC Monitoring Need to reconsider prototyped plotting service – single point of failure – enforce the color and abbreviation scheme by other means (e.g. plotting library, AGIS object attributes)