Summary report dCache T1 WS Technical exchange to improve stability and reliability of the dCache data management system at WLCG T1 centers.

Slides:



Advertisements
Similar presentations
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Advertisements

Client/Server Computing Ajay Kumar Shrivastava. Network Operating System (NOS) It manages the services of the server It exists at the session and presentation.
Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.
Network Redesign and Palette 2.0. The Mission of GCIS* Provide all of our users optimal access to GCC’s technology resources. *(GCC Information Services:
NorthGrid status Alessandra Forti Gridpp13 Durham, 4 July 2005.
15/07/2010Swiss WLCG Operations Meeting Summary of the last GridKA Cloud Meeting (07 July 2010) Marc Goulette (University of Geneva)
HEPiX Catania 19 th April 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 19 th April 2002 HEPiX 2002, Catania.
Roles and Responsibilities
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CS 111 – Nov. 19 Enterprise databases, continued –Review tables, joins, queries –Restricting one’s view of entire database –Design considerations –Concurrency.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
Conducting an Information Systems Audit
1 24x7 support status and plans at PIC Gonzalo Merino WLCG MB
Texas Nodal Program ERCOT Readiness Update TPTF March 31, 2008.
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
John Gordon STFC-RAL Tier1 Status 9 th July, 2008 Grid Deployment Board.
Evolution, by tackling new challenges| CHEP 2015, Japan | Patrick Fuhrmann | 16 April 2015 | 1 Patrick Fuhrmann On behave of the project team Evolution,
Introduction to IS & Fundamental Concepts Infsy 540 Dr. R. Ocker.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) ITIL and Grid services at GridKa CHEP 2009,
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
Feedback from the Tier1s GDB, September CNAF 24x7 support On-call person for all critical infrastructural services (cooling, power etc..) Manager.
WLCG Service Report ~~~ WLCG Management Board, 16 th December 2008.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
Busy Storage Services Flavia Donno CERN/IT-GS WLCG Management Board, CERN 10 March 2009.
Thomas Kern | The system documentation as binding agent for and in between internal and external customers April 24th, 2009 | Page 1 The system documentation.
EMI INFSO-RI European Middleware Initiative (EMI) Alberto Di Meglio (CERN)
WLCG Service Report ~~~ WLCG Management Board, 31 st March 2009.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
WLCG Grid Deployment Board CERN, 14 May 2008 Storage Update Flavia Donno CERN/IT.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES CVMFS deployment status Ian Collier – STFC Stefan Roiser – CERN.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Management cluster summary David Smith JRA1 All Hands meeting, Catania, 7 March.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon GDB 12 th October 2011, CERN.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Michael Miller Senior Director Real-Time Collaboration Products Oracle Collaboration Suite 10g Oracle Corporation.
WLCG Service Report ~~~ WLCG Management Board, 17 th February 2009.
DCache features overview | Academia Sinica, Taipei| Patrick Fuhrmann | 17 March 2013 | 1 dCache federation design Head Node Request Headnode SITE Plus.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
1 September 2007WLCG Workshop, Victoria, Canada 1 WLCG Collaboration Workshop Victoria, Canada Site Readiness Panel Discussion Saturday 1 September 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
HGF Mass Storage Support Group
Ruth Pordes, March 2010 OSG Update – GDB May 12 th 2010 Operations Services 1 Periodic reliability problems with end to end publishing to WLCG BDII – as.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Gene Oleynik, Head of Data Storage and Caching,
Understanding the New PTC System Monitor (PSM/Dynatrace) Application’s Capabilities and Advanced Usage Stephen Vaillancourt PTC Technical Support –Technical.
Status of the SRM 2.2 MoU extension
SRM Developers' Response to Enhancement Requests
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
GridKa School 2007 Christopher Jung, Forschungszentrum Karlsruhe.
ARB Schedule Locations
CHIPP - CSCS F2F meeting CSCS, Lugano January 25th , 2018.
Microsoft Virtual Academy
Summary of the dCache workshop
Presentation transcript:

Summary report dCache T1 WS Technical exchange to improve stability and reliability of the dCache data management system at WLCG T1 centers

Are we reliable The GridKa on-call engineers received 43 tickets because of storage related issues since December 20: ~3 per week dCache T1 sites have differences Number of supported experiments and size Type of HSM and HW And similarities Number of administrators (Lack of) integration with experiments

Goal(s) Discuss mutual issues, practices, procedures Improve communication Meet/Unite administrators No interference of experiments reps. at WS

Participants All dCache Tier 1 admins! Developers of dcache.org Experiments

Sources of storage (dCache) instability Complexity: not solved Increasing resource footprint: not solved Databases in the line of fire: not solved Asynchronous operations: that’s gridlife

And of course: Bugs Bugs: can be solved Can they? Remember the famous Ed Dijkstra: “Testing shows the presence, not the absence of bugs” GridKa installed ‘emergency’ patches after almost every update

Helpful hands and next steps SRM overloads Several design choices need revisit HSM: promises and expectations Workshop will have a follow-up

Sustainability dCache is developed by a small group Do T1s need/have a plan B? EOL will arrive for all software (except DOS)

The TØ1 dilemma No place to test SW at production scale If put in production the impact is highest ?

Slide input from dCache.org SRM There has been an external, independent review of the dCache SRM at FERMIlab. A lot of very good suggestions have been made. We will follow up on these. Please see Timur Perelmutov for details. We are an active participant in a WLCG storage management working group, aiming for improved overall efficiency of SRM usage. This including changes to allow clients to behave better in overload situations and the use asynchronous SRMls Chimera has been demonstrated ready to be deployed in production. The NDFG Tier I successfully did the conversion last Friday. Preliminary results : Chimera seems to scale much better than PNFS. We are in the process of conducting additional benchmarks. Please find Gerd Behrmann or Mattias Wadenstein for details. Monitoring Based on the new info service in dCache, there are multiple projects providing tighter integration between dCache and site monitoring systems (e.g. NDGF, nagios, ganglia). See Paul Miller and Gerd Behrmann ACL will be available with There will be a workshop on Chimera migration and the usage of ACL's in dCache organized by the German Storage Support group in Aachen (April 7/8). See Christopher Jung or Patrick Fuhrmann for details.

Details, minutes, names, presentations of the T1WS: Any ?s