Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBES Support Tools, Underlying Services and WLCG Operations Lionel.

Slides:



Advertisements
Similar presentations
HP Quality Center Overview.
Advertisements

ECM Project Roles and Responsibilities
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES WLCG operations: communication channels Andrea Sciabà WLCG operations.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CERN IT Department CH-1211 Genève 23 Switzerland t Service Management GLM 15 November 2010 Mats Moller IT-DI-SM.
EGI: SA1 Operations John Gordon EGEE09 Barcelona September 2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EG recent developments T. Ferrari/EGI.eu ADC Weekly Meeting 15/05/
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
EGEE is a project funded by the European Union under contract IST User support in EGEE Alistair Mills Torsten Antoni EGEE-3 Conference 20 April.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Overview ROC_LA CERN
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
GGUS at PEB – –- page 1 LCG Klaus-Peter Mickel, GridKa Karlsruhe LCG-PEB-Meeting ( ) The Global Grid User Support Model (Report of GDB.
Configuration Management and Change Control Change is inevitable! So it has to be planned for and managed.
WLCG operations A. Sciabà, M. Alandes, J. Flix, A. Forti WLCG collaboration workshop July , Barcelona.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE is a project funded by the European Union under contract IST Support in EGEE Ron Trompert SARA NEROC Meeting, 28 October
Storage Accounting John Gordon, STFC GDB March 2013.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGI Operations Tiziana Ferrari EGEE User.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Operational Architecture of PL-Grid project M.Radecki,
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
European Middleware Initiative (EMI) The Software Engineering Model Alberto Di Meglio (CERN) Interim Project Director.
Accounting Update John Gordon and Stuart Pullinger January 2014 GDB.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon Kick-off meeting, 24 th October 2011.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
APEL Cloud Accounting Status and Plans APEL Team John Gordon.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Resource Allocation in EGEEIII Overview &
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Communication tools between Grid Virtual.
CERN - IT Department CH-1211 Genève 23 Switzerland t A Quick Overview of ITIL John Shade CERN WLCG Collaboration Workshop April 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
1Maria Dimou- cern-it-gd LCG GDB May 2008 USAG and direct GGUS ticket routing to Sites Grid Deployment.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
NIKHEF11 th March WLCG Operational Costs M. Dimou, J. Flix, A. Forti, A. Sciabà WLCG Operations Coordination Team GDB – NIKHEF [11 th March.
LCG User Level Accounting John Gordon CCLRC-RAL LCG Grid Deployment Board October 2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon GDB 12 th October 2011, CERN.
CERN IT Department CH-1211 Geneva 23 Switzerland t James Casey CCRC’08 April F2F 1 April 2008 Communication with Network Teams/ providers.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Operations Portal Development Update on Requirements Cyril L'Orphelin IN2P3/CNRS.
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number GGUS Service Provider GGUS –
Accounting John Gordon WLC Workshop 2016, Lisbon.
GOCDB Handover + Status Update Quite heavy GGUS ticketing traffic; responding to user issues has been quite timely, especially in first few weeks (expected.
EGI Process Assessment and Improvement Plan – EGI core services – Tiziana Ferrari FedSM project 1EGI Process Assessment and Improvement Plan (Core Services)
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Storage Accounting John Gordon, STFC OMB August 2013.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regional tools use cases overview Peter Solagna – EGI.eu On behalf of the.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Accounting Update John Gordon. Outline Multicore CPU Accounting Developments Cloud Accounting Storage Accounting Miscellaneous.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
CERN IT Department CH-1211 Geneva 23 Switzerland t Service Reliability & Critical Services January 15 th 2008.
Gridpp37 – 31/08/2016 George Ryall David Meredith
Software Project Configuration Management
Ian Bird GDB Meeting CERN 9 September 2003
POW MND section.
Maite Barroso, SA1 activity leader CERN 27th January 2009
Description of Revision
Presentation transcript:

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Support Tools, Underlying Services and WLCG Operations Lionel Cons Maria Dimou John Gordon Stefan Roiser Andrea Sciabà WLCG-TEG-OPS F2F January 23, 2011 Amsterdam

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 2 Outline Summary of preliminary recommendations –Everything is open to discussion Support tools Underlying services WLCG Operations and procedures

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 3 Support tools NameDescriptionEffort R2.1Ensure continuous development and funding of GGUS, including WLCG requirements, in particular a failsafe solution for the full stack and interfaces with other ticketing systems/request trackers as appropriate Moderate R2.2Provide a unique portal to infrastructure information (now published via GOCDB and OIM) via a “WLCG Operations Portal” similar to the EGI portal, including the ability to send broadcasts or downtime announcements for all WLCG sites and including the publication of VO-specific information about the existing services Significant R2.3Improve BenchmarkingModerate R2.4Storage AccountingSignificant R2.5Portal AdditionsModerate

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 4 R2.1 - GGUS Definition Ensure continuous development and funding of GGUS after EGI, including WLCG requirements, in particular a failsafe solution for the full stack and interfaces with other ticketing systems/request trackers as appropriate Discuss requirements and issues in a “WLCG central operations” (see R4.1) meeting and keep having weekly meetings with GGUS developers Impact An increase in the reliability of the ticketing infrastructure and a reduction of time and manpower inefficiencies due to technical problems in ticketing systems An improved match between the ways VOs and sites want to communicate and the functionality provided by GGUS Changes implemented as normal GGUS upgrades Effort Moderate – Time devoted to discussing requirements and issues and 1 FTE in addition for development and maintenance of the interfaces with external systems and a full-stack fail- safe solution Deployment/transition constraints Gradual evolution – No significant constraints

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 5 R2.2 – Administration tools Definition Provide a unique portal to infrastructure information (now published via GOCDB and OIM) via a “WLCG Operations Portal” similar to the EGI portal, including the ability to send broadcasts or downtime announcements for all WLCG sites and including the publication of VO-specific information about the existing services A common system will contain all Grid sites, no matter to which infrastructure they belong, as it is currently the case for systems like REBUS and SAM Impact Information about WLCG sites and downtimes will be easier to find Less time spent by users to interface to different systems (GOCDB, OIM) Effort Significant – Development required to provide a common interface and required WLCG customizations Deployment/transition constraints New addition – It does not supersede existing services. Time scale can be relaxed due to the lack of urgency and to better cope with the significant effort required

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 6 R2.3 – Benchmarking for Accounting Definition Benchmarking: Normalisation of CPU data requires a reliable knowledge of the power of the CPUs. The required benchmark is HEPSPEC06. The quality of the published data is not reliable. It is not certain that all sites actually run the benchmark on their clusters. Published values for the same nominal CPU vary greatly. Sites also make mistakes in averaging results across their cluster and forgetting to update when they add new CPUs Impact Effort Moderate Deployment/transition constraints

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 7 R2.4 – Storage Accounting Definition Storage Accounting: under development in EMI for their supported storage systems. The new central infrastructure (mentioned below) will receive and store storage accounting usage records (StAR) and the portal will develop required visualisation of the data. Non-EMI storage solutions (Castor, xrootd, Bestman, …) will also need to publish if WLCG is to have complete data on storage Impact More reliable reports of storage allocated and used at sites by VO Effort Significant for EMI, low for WLCG. WLCG will need to define a use case for reporting etc. which will be developed by the accounting portal Deployment/transition constraints Gradual evolution – Will come with new releases of storage products. Will probably need extensive reality checks for quality assurance.

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 8 R2.5 – Accounting Portal Improvements Definition The Accounting Portal should show data on more users and be able to search on a particular User DN or FQAN (subject to the usual authorisation). RESTful programmatic Interface Additional views for Storage and other new resource types Impact Reduce/remove need for experiments to keep parallel accounting. Easy extraction of data for the experiment to merge with other data, display in dashboards etc. Effort Moderate – Mainly EGI effort provided WLCG requirements are well-formed Deployment/transition constraints Gradual evolution – Once centrally deployed, new central portal features be exploited when desired

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 9 Underlying services NameDescriptionEffort R3.1Implement the WLCG Messaging Roadmap being drafted, which aims at improving security, scalability and reliability/availability Moderate R3.2See slide 9 R3.4Start a WLCG survey on batch systems used in the community to understand what funding is necessary to maintain their Grid layer and ensure continuous support provision Low

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 10 R3.1 – Messaging system Definition Implement the WLCG Messaging Roadmap being drafted, which aims at improving security, scalability and reliability/availability The roadmap is at the draft stage but can be consultedconsulted Impact Limited at this time, as there are only a few applications currently using MSG Possibly significant for these few applications, in particular due to the move to secure messaging Effort Moderate – Tentatively quantified around 2-3 FTE × year, most of it already budgeted for in EMI (from the EMI Product Team) and EGI (from the MSG operations team) Deployment/transition constraints Gradual evolution – Timescale is months and the goal is progressively evolve towards better security and reliability

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 11 R3.2 - Information Services The current problems are clear (stability, information validity and information accuracy) but we are still at the crossroad that Lorenzo Dini presented at the June 2011 GDB. With his words the roads are:presentedGDB –The Lazy – no action –The Radical – decommission –The Slow and Steady – improve –The Rocky – evolve The TEG should indicate a preferred road before specific recommendations are issued

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 12 R3.4 – Batch systems Definition Start a WLCG survey on batch systems used in the community to understand what funding is necessary to maintain their Grid layer and ensure continuous support provision Impact Act on time when batch systems and/or underlying middleware change If the survey shows that we can reduce the batch solutions supported, resources will be spared Effort Moderate/Significant – Moderate for the survey, after that it may depend on the outcome Deployment/transition constraints Gradual evolution – Most probably a fair amount of time will be allowed to replace batch systems, should such decision need to be taken

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 13 WLCG Operations and procedures NameDescriptionEffort R4.1Establish a WLCG central operations team which takes care of driving all actions required and approved at a central level after discussions with site and experiment representatives Significant R4.2Evolve the Tier-1 Service Coordination Meeting into a “WLCG Operations Coordination” meeting where issues related to WLCG operations can be discussed among experiments, sites (including Tier-2 sites) and WLCG and capable of approving actions to be executed and followed up Low R4.3Strengthen the contacts with Tier-2 sites by identifying new representative roles to allow Tier- 2’s to influence the decision process and to ensure a correct information flow to/from the experiments Moderate

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 14 R4.1 – WLCG central operations Definition Establish a WLCG central operations team which takes care of driving all actions required and approved at a central level after discussions with site and experiment representatives The idea is to have a team to coordinate the execution of actions, surveys, interactions with sites, as done until now e.g. for the Information System and Data Management but for all aspects of WLCG operations Impact A single entity in charge of WLCG operations coordination is expected to improve the effectiveness of action execution, which is now done by individuals or by meetings such as the Tier-1 Service Coordination Meeting Information flow from experiments to sites should be enforced, e.g. experiments informing about current / future activities, changes in the VO card, etc. Effort Significant – Time needs to be devoted to define the exact mandate of such body and move to a new work organisation Deployment/transition constraints Disruptive change – It requires to change to a different coordination model and mechanisms, e.g. to track actions on a full WLCG scale in a more automated way

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 15 R4.2 – WLCG Operations meetings Definition Evolve the Tier-1 Service Coordination Meeting into a “WLCG Operations Coordination” meeting where issues related to WLCG operations can be discussed among experiments, sites (including Tier-2 sites) and WLCG and capable of approving actions to be executed and followed up This would be part of a rethinking of the scope of WLCG meetings including the GDB Impact Nothing is expected to change for Tier-0/1 sites, while actions on Tier-2 sites could be more thoroughly discussed and on the same time scale Effort Low – A redefinition of the scope and structure of 1-2 meetings is easy to accomplish Deployment/transition constraints Gradual evolution – Any change would be a relatively minor perturbation of the current situation

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 16 R4.3 – Tier-2 contacts Definition Strengthen the contacts with Tier-2 sites by identifying new representative roles to allow Tier-2’s to influence the decision process and to ensure a correct information flow to/from experiments Assign a “regional” contact person (e.g. per NGI or OSG/EGI/NorduGrid) Impact The information flow from experiments to T2 sites will be improved Such new role(s) ensures that the Tier-2 point of view is properly represented, while still allowing each site to express their opinion (as it – rarely – happens now) Effort Moderate – A non-negligible amount of time would have to be allocated to perform this duty and to exactly define the related responsibilities Deployment/transition constraints Gradual evolution – The new role would be seamlessly integrated in existing bodies

Experiment Support

Experiment Support Support Tools – Ticketing tools and request trackers ImpactImprovement area 5The whole GGUS stack (web frontend, Remedy system and Oracle DB) should be fail-safe 5Convince the entities funding GGUS of its sophisticated use by WLCG to ensure sustainability of our development priorities 5Interfaces among different ticketing systems should become more reliable and consistent 5Consolidate request trackers (Savannah, Trac, JIRA, …) and provide adequate support

Experiment Support Support Tools – Accounting tools ImpactImprovement area 5The CPU benchmarking data should be more accurate 5Storage accounting and its visualisation must be further developed 5The messaging infrastructure should be more reliable and its usage extended 1The Accounting Portal should have a better API and show more users data

Experiment Support Support tools – Administration tools ImpactImprovement area 5Provide an easy way to define new service types 5Provide a reliable way to publish VO-specific service downtime information (in GOCDB or elsewhere) 1Make broadcasts and downtime notifications more reliable 1Provide a better integration of GOCDB and OIM through a uniform interface 1The GOCDB/OIM information should be better kept up to date

Experiment Support Underlying services ImpactImprovement area 5Improve the security of the messaging services 5Improve the scalability of the messaging services 5Improve the reliability and the availability of the messaging services 5Improve the stability of the information services 5Improve the validity of the information 5Improve the accuracy of the information 5Provide a long term solution for a scalable and open source batch system 5Ensure that currently used batch systems are properly supported (not just on a best effort basis)

Experiment Support WLCG operations ImpactImprovement area 5Establish a proper channel to reach the T2 sites from experiments 5Strengthen the WLCG central operations 5Reduce the need for experiment contact persons while improving the quality of the link between experiments and sites where needed 1Make better and more consistent use of VO cards to express requirements to sites