WLCG Operations and Tools TEG Monitoring – Experiment Perspective Simone Campana and Pepe Flix Operations TEG Workshop, 23 January 2012.

Slides:



Advertisements
Similar presentations
WLCG Monitoring Consolidation NEC`2013, Varna Julia Andreeva CERN IT-SDC.
Advertisements

New VOMS servers campaign GDB, 8 th Oct 2014 Maarten Litmaath IT/SDC.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
WLCG Operations and Tools TEG Status Report Maria Girone and Jeff Templon GDB, 14 December 2011.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
Input from CMS Nicolò Magini Andrea Sciabà IT/SDC 5 July 2013.
EVOLUTION OF THE EXPERIMENT PROBE SUBMISSION FRAMEWORK (SAM/NAGIOS) Marian Babik.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EG recent developments T. Ferrari/EGI.eu ADC Weekly Meeting 15/05/
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF TEG Workshop, 7 th February 2012.
Evolution of Grid Projects and what that means for WLCG Ian Bird, CERN WLCG Workshop, New York 19 th May 2012.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
Impact of end of EMI+EGI-SA3 April 2013: EMI project finishes EGI-Inspire-SA3 finishes (mainly CERN affected) EGI-Inspire continues until April 2014 EGI.eu.
Recommendation 1 The IGS shall develop a standard protocol for exchanging information about IGS stations. The associated machine-readable database should.
WLCG operations A. Sciabà, M. Alandes, J. Flix, A. Forti WLCG collaboration workshop July , Barcelona.
Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI (Present and) Future of the EGI Services for WLCG Peter Solagna – EGI.eu.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
WLCG infrastructure monitoring proposal Pablo Saiz IT/SDC/MI 16 th August 2013.
Information System Status and Evolution Maria Alandes Pradillo, CERN CERN IT Department, Grid Technology Group GDB 13 th June 2012.
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Network and Transfer WG perfSONAR operations Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 28 h January 2015.
Dario Barberis & Dave Dykstra: Database TEG WLCG TEG Workshop - 7 February Database Technical Evolution Group (extract for GDB) Dario Barberis &
Storage Interfaces Introduction Wahid Bhimji University of Edinburgh Based on previous discussions with Working Group: (Brian Bockelman, Simone Campana,
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
Julia Andreeva on behalf of the MND section MND review.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Storage Interfaces and Access pre-GDB Wahid Bhimji University of Edinburgh On behalf of all those who participated.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
WLCG Operations and Tools TEG Report to the GDB Jeff Templon and Maria Girone 9 November 2011.
Next Steps after WLCG workshop Information System Task Force 11 th February
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
Comments on SPI. General remarks Essentially all goals set out in the RTAG report have been achieved. However, the roles defined (Section 9) have not.
WLCG Information System Status Maria Alandes Pradillo, CERN CERN IT Department, Support for Distributed Computing Group GDB 9 th September 2015.
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Monitoring Overview: status, issues and outlook Simone Campana.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regional tools use cases overview Peter Solagna – EGI.eu On behalf of the.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI First Ops Tools Long Term Sustainability F2F David Collados 1First Ops Tools.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
WLCG Accounting Task Force Introduction Julia Andreeva CERN 9 th of June,
Daniele Bonacorsi Andrea Sciabà
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
POW MND section.
Storage Interfaces and Access: Introduction
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Taming the protocol zoo
Maite Barroso, SA1 activity leader CERN 27th January 2009
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

WLCG Operations and Tools TEG Monitoring – Experiment Perspective Simone Campana and Pepe Flix Operations TEG Workshop, 23 January 2012

Summary of Recommendations ItemDescriptionEffortImpact 1.1Create a WLCG monitoring coordination bodyVery Moderate Very Significant 1.2Streamline experiment monitoring common frameworks Moderate/ Significant Significant 1.3Network monitoringSignificant 1.4Streamline availability calculation and visualizationModerateSignificant 1.5Bridge sites and experiments perspectives on availability and usability SignificantVery Significant

Streamline Experiment Monitoring Common Frameworks SAM is already in use by 4 experiments – Discussed later SSB is used to publish additional quality metrics and site status by ATLAS and CMS – Possible interest from LHCb CMS Site Readiness offers very useful functionality – Site ranking, history plots, summary tables – Missing functionality/views should be imported into SSB SSB should be extended with a notification system – Commonality with SAM Look more in details the self-contained approach of Alice 3

Network Monitoring PerfSONAR-(PS/MND) should be installed at every WLCG site as part of the middleware Latency tests and throughput tests should be run regularly as part of the infrastructure – Frequency depending on the pair of sites, based on experiment requirements Measurements should be exposed both through and web portal and programmatically Proactivity of sites and network providers in sorting out network issues Network monitoring should be centrally coordinated in WLCG 4

Availability Calculation and Visualization ACE should be the used to calculate ALL availabilities ASAP. In the short term, SUM is going to be used by experiments to visualize the availability – It has been validated already As next step, experiments will validate MyWLCG – Supported anyhow for EGI (MyEGI) We recommend to end up with ONE system for the visualization

Bridge sites and experiments perspectives on Availability and Usability SAM Experiment Tests are extended to include more realistic tests (see Dec. 14 GDB presentation) – Some tests will contribute to the availability Properly agreed between experiments and sites, well documented – Some tests will not contribute to the availability Will anyway be used by experiment ops and contact people at the sites The SAM framework is extended/enhanced to – Support finer granularity (e.g. the storage space token) – Support coarser granularity (e.g. the whole site) – Test services not in GOGDB (or adding a service in GOCDB should be simplified) – Provide a simple way for changing the result of a test and recalculate availability 6

WLCG Operations and Tools TEG Monitoring – Sites Perspective Simone Campana and Pepe Flix Operations TEG Workshop, 23 January 2012

Summary of Recommendations ItemDescriptionEffortImpact 1.6Bridge sites and experiments perspectives on availability and usability SignificantVery Significant 1.7Provide a site-oriented view of experiment monitoring metrics SignificantVery Significant 1.8Improve middleware toward service monitoringSignificant

Bridge sites and experiments perspectives on Availability and Usability Sites are encouraged to look proactively at tests and quality metrics – Critical Tests at least – An experiment contact should look also at other tests and quality metrics Sites are encouraged to benefit from the notification system of SAM (and SSB) – Increases proactivity – Looks simpler if the site uses Nagios for internal monitoring. Can sites share experience? 9

Provide a site-oriented view of experiment monitoring metrics We miss the equivalent of the today’s SSB experiment views tailored for sites Experiments and sites should agree on what is relevant – Start with a handful number of metrics Start with SAM critical tests and blacklisting – Possibly extend to quality metrics and non critical tests Experiments should commit in providing and maintaining the information – Using existing framework (e.g. SSB) and infos therein would be a benefit Provide a flexible visualization interface – Showing metrics history – Allowing to select subsets of metrics 10

Improve middleware toward service monitoring Middleware providers should – Avoid tight integration with a specific fabric monitoring – Provide instead generic probes to be integrated in any framework – Improve logging to facilitate development of new probes Sites should share knowledge and code for fabric monitoring probes – Common repository? 11

Conclusions We do not propose a revolution but rather an evolution of the existing tools – Those tools we know, we are used to them, they work Network monitoring will require more work, but the process already started Coordination of efforts is an essential ingredient 12