Monitoring for CCRC08, status and plans Julia Andreeva, CERN 04.03.2008, F2F meeting, CERN.

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services GS group meeting Monitoring and Dashboards section Activity.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks VO-specific systems for the monitoring of.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
LCG Plans for Chrsitmas Shutdown John Gordon, STFC-RAL GDB December 10 th, 2008.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.
CERN IT Department CH-1211 Geneva 23 Switzerland t CCRC’08 Tools for measuring our progress CCRC’08 F2F 5 th February 2008 James Casey, IT-GS-MND.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
CCRC’08 Monthly Update ~~~ WLCG Grid Deployment Board, 14 th May 2008 Are we having fun yet?
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey WLCG Monitoring – some worked examples.
Julia Andreeva on behalf of the Dashboard team. 45 machines (14 dedicated to cms) 9% still on SLC4 (two weeks ago 30%) Services.
Visualization Ideas for Management Dashboards
CERN IT Department CH-1211 Geneva 23 Switzerland t DBES LHC(b) Grid operations Roberto Santinelli IT/ES 5 th User Forum – Uppsala April.
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Julia Andreeva on behalf of the MND section MND review.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
WLCG Service Report ~~~ WLCG Management Board, 31 st March 2009.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Operation Issues (Initiation for the discussion) Julia Andreeva, CERN WLCG workshop, Prague, March 2009.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
Enabling Grids for E-sciencE Grid monitoring from the VO/User perspective. Dashboard for the LHC experiments Julia Andreeva CERN, IT/PSS.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Analysis of Service Incident Reports Maria Girone WLCG Overview Board 3 rd December 2010, CERN.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
CERN IT Department CH-1211 Geneva 23 Switzerland t LHCOPN Meeting Madrid, 11 th March 2008 James Casey WLCG Monitoring – An overview.
Daniele Bonacorsi Andrea Sciabà
Update on CERN IT Unified Monitoring Architecture (UMA)
Key Activities. MND sections
POW MND section.
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
New monitoring applications in the dashboard
Experiment Dashboard overviw of the applications
Monitoring of the infrastructure from the VO perspective
Site availability Dec. 19 th 2006
Presentation transcript:

Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN

Julia Andreeva, CERN, F2F meeting 2 Monitoring of VO critical services Critical Services “GridMap” is becoming more informative, in particular regarding test status part. Best example is ATLAS Still there is a big room for improvements ( a lot of white and blue colours, which should gradually disappear)

Julia Andreeva, CERN, F2F meeting 3 Site monitoring Experiments are using SAM framework for monitoring services at the sites Central repository of test results, possibility to use this information for global monitoring view for all 4 experiments (example “Critical Services GridMap”) Publishing of the test results into site fabric monitoring is in progress

Julia Andreeva, CERN, F2F meeting 4 Site monitoring New applications for site monitoring developed in CMS and demonstrated during CMS week, among them: - Site Status Board (CMS dashboard application) Shows whether site is in BDII, site scheduled downtime, status of the CE and SE defined via results of CMS specific tests, CMS JobRobot jobs efficiency at the site, status of analysis and production, number of comissioned transfer links, status of ongoing transfers, software tags published at the site, etc… Flexible, allows to publish new metrics

Julia Andreeva, CERN, F2F meeting 5 Monitoring of VO functional blocks For this kind of monitoring experiments are mainly relying on the experiment specific monitoring systems, like Experiment Dashboards, Phedex, Dirac, ALICE Monalisa repository. Serve well the needs of the experiments No central repository for monitoring data, no common way of browsing this information. For someone who is not a part of the experiment or even a given activity in the experiment it is difficult to find needed information.

Julia Andreeva, CERN, F2F meeting 6 ATLAS Dashboard for shifters as an example of use of the Experiment Dashboard during CCRC08 and beyond Both DDM and ProdSys Dashboards had been intensively used by shifters and taken as a main source of information for people taking shifts during CCRC08 ATLAS Shifter’s Workbook: Very positive feedback from people taking shifts New features like ‘shifter’s quick view’, integration of ticketing systems with the dashboards, shifter’s daily report covering open issuers for the next shifter… Live demo of the new features during ATLAS workshop last week

Julia Andreeva, CERN, F2F meeting 7 Need for a global view of the experiments workflows (all 4 experiments in a consistent way) Is needed for people running infrastructure - to understand whether experiments are able to reach their targets - to notice the problem asap - when everything goes fine – show it! - the same resources are shared between 4 VO, need to be able to see correlations if any Is needed for site administrators, in particular for sites serving several VOs For experiments, despite the fact that they rely on their specific monitoring - Correlations. Performance for a given activity had dropped, may be it is related to the activity of other experiments?

Julia Andreeva, CERN, F2F meeting 8 Functional blocks have much in common Functional blocks for LHC experiments are similar to a large extent Example - functional blocks for ATLAS and CMS for CCRC08: Data archiving at T0 Data processing at T0 Data transfer from T0 CAF Data archiving at Tier1 Processing at Tier1 Data transfer T1- T1 Data transfer T1- T2 MC production at T2 Analysis at T2 Data transfer T2- T1 MC production at T2 Analysis at T2 Data transfer T2- T1 Data archiving at T0 Data processing at T0 Data transfer from T0 Processing at Tier1 Data transfer T1- T1 Data transfer T1- T2 A lot of simple metrics like transfer throughput, number of jobs running in parallel, success rate calculated as number of successes divided by number of attempts are also similar CMS ATLAS T0 T1 T2

Julia Andreeva, CERN, F2F meeting 9 Monitoring of the Experiments workflows Phedex CMS Dashboard ATLAS Dashboard Monalisa repository for ALICE DIRAC UI Based on GridMap view application Metrics and targets are defined and measured by the experiments according to their workflows Repository for common aggregated metrics GRID publisher More details in tomorrow talk of James

Julia Andreeva, CERN, F2F meeting 10 Tier 0 Data archiving At T0 Data processing At T0 Transfer T0-T1CAF Tier 1 Tier 2 Data Archiving at T1 Processing At T1 Data transfer T1-T2 Data transfer T1-T1 MC production at T2 Analysis at T2 Data transfer T2-T1 Example of “Gridmap” visualisation for CCRC08 workflow for CMS (random choice of colour)

Julia Andreeva, CERN, F2F meeting 11 Tier 0 Data archiving At Tier 0 Data processing At Tier0 Transfer from Tier0 to Tier1 CAF Tier 1 Tier 2 Data Archiving at Teir1 Processing At Tier1 Data transfer T1-T2 Data transfer T1-T1 MC production at Tier2 Analysis at Tier2 Data transfer T2-T1 Data tranfer T2-T2 CCRC08 target for number of slots for data processing Number of jobs running in parallel for data processing XXX Max, min, avg of number of parallel jobs over last 24 hours Data processing success rate X% Three top errors Subset of the same info for every Tier1 Moving mouse over Example of “Gridmap” visualisation for CCRC08 workflow for CMS (random choice of colour)

Julia Andreeva, CERN, F2F meeting 12 Tier 0 Data archiving At Tier 0 Data processing At Tier0 Transfer from Tier0 to Tier1 CAF Tier 1 Tier 2 GridMap view example for CCRC08 CMS (random choice for colours, no real data) Data Archiving at Teir1 Processing At Tier1 Data transfer T1-T2 Data transfer T1-T1 MC production at Tier2 Analysis at Tier2 Data transfer T2-T1 Data tranfer T2-T2 T1_DE_FZK T1_FR_IN2P3 T1_IT_CNAF T1_TW_ ASGC T1_UK_ RAL T1_ ES_ PIC T1_US_FNAL New window On click List of useful URLs to the detailed data Moving mouse over

Julia Andreeva, CERN, F2F meeting 13 Possible to derive global view or plot for a given metrics or activity for all 4 LHC experiments (per site, per tier, overall) CMS ATLAS LHCb ALICE