SRM 2.2: experiment requirements, status and deployment plans 6 th March 2007 Flavia Donno, INFN and IT/GD, CERN.

Slides:



Advertisements
Similar presentations
Jens G Jensen CCLRC/RAL hepsysman 2005Storage Middleware SRM 2.1 issues hepsysman Oxford 5 Dec 2005.
Advertisements

Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
INFSO-RI Enabling Grids for E-sciencE SRMv2.2 experience Sophie Lemaitre WLCG Workshop.
CERN - IT Department CH-1211 Genève 23 Switzerland t LCG Deployment GridPP 18, Glasgow, 21 st March 2007 Tony Cass Leader, Fabric Infrastructure.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Status report on SRM v2.2 implementations: results of first stress tests 2 th July 2007 Flavia Donno CERN, IT/GD.
Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN.
SRM 2.2: tests and site deployment 30 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
Report on Installed Resource Capacity Flavia Donno CERN/IT-GS WLCG GDB, CERN 10 December 2008.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
SRM Monitoring 12 th April 2007 Mirco Ciriello INFN-Pisa.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/06/07 SRM v2.2 working group update Results of the May workshop at FNAL
Information System Status and Evolution Maria Alandes Pradillo, CERN CERN IT Department, Grid Technology Group GDB 13 th June 2012.
CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
Busy Storage Services Flavia Donno CERN/IT-GS WLCG Management Board, CERN 10 March 2009.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
Storage Classes report GDB Oct Artem Trunov
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
WLCG Grid Deployment Board CERN, 14 May 2008 Storage Update Flavia Donno CERN/IT.
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Storage Element Model and Proposal for Glue 1.3 Flavia Donno,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
1 SRM v2.2 Discussion of key concepts, methods and behaviour F. Donno CERN 11 February 2008.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
Implementation of GLUE 2.0 support in the EMI Data Area Elisabetta Ronchieri on behalf of JRA1’s GLUE 2.0 Working Group INFN-CNAF 13 April 2011, EGI User.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
SRM v2.2: service availability testing and monitoring SRM v2.2 deployment Workshop - Edinburgh, UK November 2007 Flavia Donno IT/GD, CERN.
LCG Service Challenge: Planning and Milestones
Status of the SRM 2.2 MoU extension
Flavia Donno, Jamie Shiers
Flavia Donno CERN GSSD Storage Workshop 3 July 2007
Short term improvements to the Information System: a status report
SRM v2.2 / v3 meeting report SRM v2.2 meeting Aug. 29
SRM Developers' Response to Enhancement Requests
SRM2 Migration Strategy
Proposal for obtaining installed capacity
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Presentation transcript:

SRM 2.2: experiment requirements, status and deployment plans 6 th March 2007 Flavia Donno, INFN and IT/GD, CERN

2 SRM 2.2 experiment requirements  In June 2005 the Baseline Service Working Group published a report:   A Storage Element Service was considered mandatory and high priority.  The full set of recommended features available by February 2006 but based on the already available SRM v2.1 features  In Mumbai (February 2006) experiments changed their requirements based on their previous usage of the available service.  LCG MoU for SRM 2.2 defined in May 2006 at FNAL  day2%5B1%5D.pdf day2%5B1%5D.pdf  The experiment requirements were defined:  Support for Permanent Files and volatile copies  Space Reservation (both static and dynamic) with possibility of releasing space  Permission Functions only on directories based on VOMS group/roles  Directory Functions  Data Transfer Functions: PrepareToGet/Put, Copy  File access protocol negotiation  VO specific relative paths

3 SRM 2.2 activities  Weekly meetings to follow the development of the implementations up to December 2006  The LBNL testing suite written in Java was the only one available till September 2006, manually run. Only in January 2007 this test suite was running automatically every day.    In September 2006 CERN took over from RAL the development of the S2 SRM 2.2 testing suite, enhancing it with a complete test set and with publishing and monitoring tools.  Reports about the status of the SRM 2.2 test endpoints are given monthly to the LCG MB

4 Study of SRM 2.2 specification  In September 2006 very different interpretations of the spec  3 Releases of the specification: July, September, December  A study of the spec (state/activity diagrams) has identified many behaviours not defined by the specs.  A list of about 50 points has been compiled in September  Many issues solved. Last 30 points discussed and agreed during the WLCG Workshop. The implementation for those will be ready in June  The study of the specifications, the discussions and testing of the open issues have helped insure consistency between SRM implementations.

5 Tests executed  S2 test suite testing availability of endpoints, basic functionality, use cases and boundary conditions, interoperability, exhaustive and stress tests.  Availability: Ping and full put cycle (putting and retrieving a file)  Basic: basic functionality checking only return codes and passing all basic input parameters  Usecases: testing boundary conditions, exceptions, real use cases extracted from the middleware clients and experiment applications.  Interoperability: servers acting as clients, cross copy operations  Exhaustive: Checking for long strings, strange characters in input arguments, missing mandatory or optional arguments. Output parsed.  Stress: Parallel tests for stressing the systems, multiple requests, concurrent colliding requests, space exhaustion, etc.  S2 tests cron job running 5 times per day  In parallel, manual tests from GFAL/lcg-utils,FTS, DPM test suite.

6 Tests executed  For now only availability, basic, use case and interoperability tests executed on a regular base  Results published daily on a web page. Latest and history available:   Results of failed and successful tests reported to developers to signal issues. Status page compiled:  tationsProblems  Test results and issues discussed on srm-tester and srm-devel lists  

7 Tests executed Copy and ChangeSpaceForFiles MoU SRM methods needed by the end of Expected by the end of summer srmCopy needed now only for dCache!

8 Tests executed Availability UseCase Interoperability/ Cross Copy

9 Testing Plan  Plan for 1Q of 2007 :  Phase 1: From 16 Dec 2006 until end of January 2007:  Availability and Basic tests  Collect and analyze results, update page with status of endpoints:  Plot results per implementation: number of failures/number of tests executed for all SRM MoU methods.  Report results to WLCG MB.  Phase 2: From beginning until end of February 2007:  Perform tests on use-cases (GFAL/lcg-utils/FTS/experiment specific), boundary conditions and open issues in the spec that have been agreed on.  Plot results as for phase 1 and report to WLCG MB.  Phase 3: From 1 March until “satisfaction” :  Add more SRM 2.2 endpoints (some T1s ?)  Stress testing  Plot results as for phase 2 and report to WLCG MB.  This plan has been discussed during the WLCG workshop. The developers have agreed to work on this as a matter of priority.

10 Test results

11 Test results

12 Test results

13 Tests executed: status of the implementations  DPM version available for production. SRM 2.2 features still not officially certified. Implementation stable. All MoU methods implemented. Usecase tests are OK. Copy not available but interoperability tests are OK. Few general issues to be solved.  DRM and StoRM: Stable implementations. All MoU methods implemented. Copy in PULL mode not available in StoRM but interoperability tests OK. Some Usecase tests still not passing and under investigation.  dCache: Stable implementation is rather stable. All MoU methods have been implemented (including Copy that is absolutely needed for dCache). Interoperability tests not yet working with CASTOR and DRM. Working on some Usecase tests. General issues to be resolved (overwrite flag not supported).  CASTOR: The implementation is still rather unstable. A lot of progress during the last 3 weeks. Main instability causes found (race conditions, unintended mixing of threads and forks, etc.). Various problems identified and fixed by D. Smith, S. De Witt and G. Lo Presti with the involvement of various people from different IT groups. Various use cases to be resolved.

14 Status of SRM clients  FTS  SRM client code has been unit-tested and integrated into FTS  Tested against DPM, dCache and StoRM. CASTOR and DRM test started.  Released to development testbed.  Experiments could do tests on the dedicated UI set up for this purpose.  New dCache endpoint setup at FNAL for stress test.  GFAL/lcg-utils  New rpms available on test UI.  Still using old schema (see next slide).

15 GLUE Schema  GLUE 1.3 available   Not everything originally proposed, only the important changes  LDAP implementation done by Sergio Andreozzi. Available on the test UI.  Information providers started by Laurence Field. Static Information Providers available on test UI for CASTOR, dCache, DPM, and STORM.  Clients need to adapt to new schema.

16 Grid Storage System Deployment (GSSD)  Working group launched by the GDB to coordinate SRM 2.2 deployment for Tier-1s and Tier-2s   Mailing list:  People involved: developers, site admins, experiments  Use pre-GDB meetings for discussions: Tier-1s presenting their setup and reporting problems. Some Tier-2s.  Working groups setup to work on specific issues:  SRM v1 to v2 migration plan  Experiments input for Storage Classes, transfer rates, data flow patterns (input completed for LHCb and CMS, ATLAS coming)  Database entries conversion  Monitoring utilities

17 Deployment plan  (A.) Working with few candidate Tier-1s: IN2P3, FZK, SARA  (B.) Dedicated testing hardware.  (C.) DPM and dCache 1.8 candidate implementations to test.  (D.) Have dCache and DPM tested at candidate sites, even if probably interoperability not fully tested till other implementations are deployed.  (E.) Enlarging the testbed to other sites even in production. Applying the migration plan foreseen for SRMv1-v2.  (F.) Include other implementations as soon as ready

18 Deployment plan: client perspective and backup plan  FTS, lcg-utils, GFAL  SRM v1.1 will be the default until SRM 2.2 is deployed in production and it has proven stable. However it is possible to configure a different default (v2.2)  Site admins have to run both SRM v1.1 and SRM v2.2  Until SRM v2.2 installation is stable  SRM type is retrieved from the information system  In case 2 versions found for the same endpoint SRM v2.2 is chosen only if space token (storage quality) specified. Otherwise SRM v1.1 is the default  FTS can be configured per channel on the version to use; policies can also be specified (“always use SRM 2.2”, “use SRM 2.2 if space token specified”,…)  Working on the details to make sure SRM v1.1 and SRM v2.2 allow for access to the same data (already possible for dCache and DPM). ===>>> It is possible and foreseen to run in mixed mode with SRM v1.1 and SRM v2.2, until SRM v2.2 is proven stable for all implementations. ===>>> Backup plan: run in mixed mode with SRM v1.1 and SRM v2.2 for the implementations that are ready.

19 Conclusions clearer specifications  Much clearer description of SRM specifications. All ambiguous behaviors made explicit. A few issues left out for SRM v3 since they do not affect the SRM MoU. establishedmethodology  Well established and agreed methodology to check the status of the implementations. Boundary conditions, use cases from the upper layer middleware and experiment applications will be the focus of next month’s work. Monthly reports and problems escalation to the WLCG MB. clear plan  A clear plan has been put in place in order to converge. not whereplanned  We are still not where we planned to be. However, we have a clearer view on how to proceed. sites and experiments Storage Classes  Working with sites and experiments for the deployment of the SRM 2.2 and Storage Classes. Specific guidelines for Tier-1 and Tier-2 sites are being compiled. plan  The plan foresees the use of a mixed environment SRM v1 and v2, where the upper layer middleware takes care of hiding the details from the users.