NIKHEF11 th March 2015 1 WLCG Operational Costs M. Dimou, J. Flix, A. Forti, A. Sciabà WLCG Operations Coordination Team GDB – NIKHEF [11 th March.

Slides:



Advertisements
Similar presentations
Operations Coordination Team Maria Girone, CERN IT-ES GDB 10 th October 2012.
Advertisements

Operations Coordination Team Maria Girone, CERN IT-ES Kick-off meeting 24 th September 2012.
EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES WLCG operations: communication channels Andrea Sciabà WLCG operations.
UK NGI Operations John Gordon 10 th January 2012.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operations update Guillaume Cessieux.
Ian Bird LHCC Referee meeting 23 rd September 2014.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Ops WG Act 4 – Conclusion Guillaume.
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
Core operations Jeremy Coles GridPP28 17 th April 2012 Jeremy Coles GridPP28 17 th April 2012 a b.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Overview of day-to-day operations Suzanne Poulat.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
LCG Introduction John Gordon, STFC-RAL GDB September 9 th, 2008.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
WLCG operations A. Sciabà, M. Alandes, J. Flix, A. Forti WLCG collaboration workshop July , Barcelona.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Seattle Community Colleges District IT Advisory Committee Information Technology Services Customer Service Survey Results Fall, 2009 Information Technology.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
Handling ALARMs for Critical Services Maria Girone, IT-ES Maite Barroso IT-PES, Maria Dimou, IT-ES WLCG MB, 19 February 2013.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Ian Bird GDB CERN, 9 th September Sept 2015
Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.
CCRC’08 Monthly Update ~~~ WLCG Grid Deployment Board, 14 th May 2008 Are we having fun yet?
LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 1 Tier 1 status, a summary based upon a internal review Volker Gülzow DESY.
Last update 21/01/ :05 LCG 1Maria Dimou- cern-it-gd Current LCG User Registration, VO management and Authorisation Procedures VOMS workshop
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon Kick-off meeting, 24 th October 2011.
State of Georgia Release Management Training
CERN IT Department CH-1211 Genève 23 Switzerland t Experiment Operations Simone Campana.
Proposal for a Global Network for Beam Instrumentation [BIGNET] BI Group Meeting – 08/06/2012 J-J Gras CERN-BE-BI.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
AEGIS Academic and Educational Grid Initiative of Serbia Antun Balaz (NGI_AEGIS Technical Manager) Dusan Vudragovic (NGI_AEGIS Deputy.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon GDB 12 th October 2011, CERN.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Accounting Review Summary from the pre-GDB related to CPU (wallclock) accounting Julia Andreeva CERN-IT GDB 13th April
WLCG Operations Coordination and Commissioning Maria Girone, CERN IT On behalf of the Operations Coordination Team 11 th March OSG All Hands Meeting,
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.
WLCG Operations Coordination news and meeting restructuring Maria Alandes Pradillo Josep Flix Alessandra Forti Andrea Sciabà WLCG operations coordination.
Operation team at Ccin2p3 Suzanne Poulat –
Grid Computing Jeff Templon Programme: Group composition (current): 2 staff, 10 technicians, 1 PhD. Publications: 2 theses (PD Eng.) 16 publications.
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
CERN - IT Department CH-1211 Genève 23 Switzerland t Service Level & Responsibilities Dirk Düllmann LCG 3D Database Workshop September,
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
LCG Introduction John Gordon, STFC-RAL GDB November 7th, 2007.
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
J. Templon Nikhef Amsterdam Physics Data Processing Group “Grid” Computing J. Templon SAC, 26 April 2012.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
WLCG Accounting Task Force Introduction Julia Andreeva CERN 9 th of June,
WLCG Network Discussion
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
How to enable computing
Update on Plan for KISTI-GSDC
Proposal for obtaining installed capacity
(Insert name of community) Planning Commission Annual Report
Presentation transcript:

NIKHEF11 th March WLCG Operational Costs M. Dimou, J. Flix, A. Forti, A. Sciabà WLCG Operations Coordination Team GDB – NIKHEF [11 th March 2015]

NIKHEF11 th March WLCG Operational Costs  WLCG Ops Coordination team was asked to launch this project in order to understand how effectively Grid Operations for the LHC experiments are organized, both centrally and at the computing sites  Feedback from the experiments was collected at the end of 2014  A Site Survey was as well circulated at the end of 2014: ‣ Each site provided one (detailed and anonymous) answer ‣ All of these complete and detailed answers are useful for all sites ‣ Allow sites to give their feedback and suggestions on how to improve Ops ‣ ‣ ~100 sites answered the Survey  The input received is very useful to draw indications on what could be done to make WLCG operations less expensive for the sites, for the experiments, and for the central operations team

NIKHEF11 th March WLCG Site Survey  The Survey was focused, mainly, in 5 areas: ‣ FTE effort spent on operating services used for WLCG and on other activities related to WLCG operations ‣ Service upgrades & changes ‣ Communications ‣ Monitoring ‣ Services administration  The answers are (still) being analyzed ‣ The final report will be provided at the WLCG Collaboration Workshop in Okinawa (April 2015) ‣ Today we show some ( preliminary ) results for the FTE effort and Communication areas

NIKHEF11 th March FTE Effort in WLCG

NIKHEF11 th March FTE Effort on WLCG Ops  Effort quantified in amount of FTEs ‣ The amount of FTEs defined as the ratio between the number of hours spent on the task in a year, divided by 1,600 hours ‣ Acknowledge that these estimates can be affected by a large uncertainty (including a misinterpretation, which was apparent in a few cases)  careful in drawing strong conclusions from them

NIKHEF11 th March Suported VOs: tickets and effort

NIKHEF11 th March Suported VOs: tickets and effort  Observations: ‣ Support via tickets for WLCG is not clearly scaling with the number of LHC VOs supported by the sites ‣ Total FTE effort reported by the sites is not clearly scaling with the number of LHC VOs supported by the sites

NIKHEF11 th March Effort per area (T0/T1s)  T0/1s: ~12.3 FTE; average for all categories: 0.7 FTE +T0 Dominated by core services CVMFS S0 & S1 FTS3 LFC VOMS WMs... Dominated by: Exp. services development Virtualization HW provisioning OS & configuration management trackers & version control...

NIKHEF11 th March Effort per area (T2s)  T2s: =2.8 FTE; average for all categories 0.2 FTE Dominated by: Exp. Developments Exp. Specific tasks HW provisioning OS & configuration management... Dominated by: APEL WMS VOMS...

NIKHEF11 th March FTE Effort vs. Size of the sites 1/2  CPU “Size” of the sites taken from 2014 accounting: per day  Disk & Tape for T0/T1s taken from WLCG monthly accounting ‣ Disk for T2s from Fed. pledges available – no breadown of installed/site sites X

NIKHEF11 th March FTE Effort vs. Size of the sites 2/2  A clear correlation is visible for T0/1s: more FTEs for bigger sites ‣ Not so clear for T2 sites excludes meetings, new tech., TFs, WGs... includes only Operations of Services

NIKHEF11 th March FTE Effort Observations  Not unexpectedly, the storage is the service that requires the highest amount of effort, in all of the sites  Core “Grid/Exp.” services in T0/T1s take more effort than in T2s ‣ APEL is the most frequently mentioned service in the "other Grid services" category for T2s  Exp. services development / Virtualization / HW provisioning / OS & configuration mgt takes more effort in T0/T1s than in T2s  Networking effort is similar in T0/T1s and T2s  Infrastructure services such as perfSONAR, Squid, ARGUS/GUMS take very little manpower

NIKHEF11 th March Communication in WLCG

NIKHEF11 th March Communications 1/12

NIKHEF11 th March Communications 2/12  What could be done to improve the communication between the site and WLCG operations? (free text)  Some answers: ‣ Distinguish official requirements approved by WLCG ops vs. suggestions from experiments (it is not always obvious to distinguish) ‣ Creation of GGUS support unit for WLCG operations ‣ Establishing a WLCG Ops bulletin ‣ More feedback from sites before requests to sites are made ‣ Important service requests, like XRootD or WebDAV protocols should come from WLCG as formal requests, assuming new services are discussed within WLCG management board and properly endorsed

NIKHEF11 th March Communications 3/12

NIKHEF11 th March Communications 4/12  How would you improve the sharing of information across WLCG sites? (free text)  Some answers: ‣ More sites participating in HEPiX and GDB ‣ HEPIX is seen as quite effective indeed ‣ Creating new e-groups to share information on site-specific services and/or issues ‣ Acknowledged the relevance of LCG-ROLLOUT ‣ Consolidate the relevant information in open WLCG twikis ‣ Look into less pages – find more information (and more relevant) ‣ Mini-workshops on specific topics, and/or the creation of an annual Tier1 or WLCG sites Jamboree (site oriented)

NIKHEF11 th March Communications 5/12 HEPIX LCG-ROLLOUT GDB CHEP WLCG Ops Coord (T1s) private chats

NIKHEF11 th March Communications 6/12

NIKHEF11 th March Communications 7/12  What changes do you think would make the meeting more effective and interesting for you as a site? (free text)  Some answers: ‣ To be more focus (WLCG Ops Coord. meeting): ‣ avoid reports from TFs with little progress ‣ Shorten to 1h, maximum ‣ Time slot: ‣ Current: does not allow for Asia participation ; US would like to be a bit later (16:00 CET) ‣ Adding: actions from/to sites in the meeting minutes ‣ Sometimes, not clear for sites what are supposed to do, when reading the minutes

NIKHEF11 th March Communications 8/12

NIKHEF11 th March Communications 9/12  If your site is not involved in a TF or WG, please indicate the main reason(s) (free text)  Some answers: ‣ Many sites answered “Lack of manpower” ‣ Some: “Not a funded work / part of the WLCG commitment” ‣ Time-zone difference ‣ Problems in operating the site, but willing to participate

NIKHEF11 th March Communications 10/12  What improvements would you like to see in GGUS? (free text)  Some answers: ‣ Easy programmatic access to current and historical content ‣ Improvements in the Interface ‣ Every piece of middleware should be supported via GGUS

NIKHEF11 th March Communications 11/12  When WLCG expects a certain action from a site (service upgrades and reconfiguration, etc...), what channels do you want to be used, in order of importance? ‣ Broadcasts and GGUS tickets are considered by far the best methods to communicate requests to sites ‣ Operations meetings are far behind Answers

NIKHEF11 th March Communications 12/12

NIKHEF11 th March Communication Observations  Well covered domains: ‣ WLCG Ops communication from/to sites ok, but it might be improved ‣ Information share across sites ‣ some sites are asking to have a (yearly?) dedicated WLCG sites meeting ‣ The channels seem good and sufficient ( , meetings, wikis, etc...) ‣ Suggestions to explore collaborative tools... ‣ WLCG TFs and WGs are considered useful ‣ GGUS is very much appreciated as a support tool ‣ The 3pm Ops call is considered useful by the T0/T1s - the frequency is fine  Domains to improve: ‣ The role of WLCG Ops between the Experiments and their collaborating Sites ‣ The content of the fortnightly WLCG Ops Coord meeting (66/101 sites Never or Rarely attend) ‣ How to attract more site attention, in particular T2s

NIKHEF11 th March Conclusions so far WLCG Workshop (Okinawa), next month…