Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.

Slides:

Advertisements

Similar presentations

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

Advertisements

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)

5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)

Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC

CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

SRM 2.2: tests and site deployment 30 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN.

Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

WLCG Nagios and the NGS. We have a plan NGS is using a highly customised version of the (SDSC written) INCA monitoring framework. It was became too complicated.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.

Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.

CSCS Status Peter Kunszt Manager Swiss Grid Initiative CHIPP, 21 April, 2006.

CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

EGI-InSPIRE EGI-InSPIRE RI DDM Site Services winter release Fernando H. Barreiro Megino (IT-ES-VOS) ATLAS SW&C Week November

Storage cleaner: deletes files on mass storage systems. It depends on the results of deletion, files can be set in states: deleted or to repeat deletion.

Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.

1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.

WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.

1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

GGUS Slides for the 2012/07/24 MB Drills cover the period of 2012/06/18 (Monday) until 2012/07/12 given my holiday starting the following weekend. Remove.

INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.

SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.

The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.

Update of SAM Implementation ALICE TF Meeting 18/10/07.

EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.

Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007.

FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.

Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.

DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.

INFSO-RI Enabling Grids for E-sciencE FTS failure handling Gavin McCance Service Challenge technical meeting 21 June.

Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.

11/01/2012B.Couturier - Core Software Workshop 1 Software Development Infrastructure Main Topics Development tools Build and Release tools Tracking/Management.

David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.

Enabling Grids for E-sciencE Experience Supporting the Integration of LHC Experiments Computing Systems with the LCG Middleware Simone.

1 November 17, 2005 FTS integration at CMS Stefano Belforte - INFN-Trieste CMS Computing Integration Coordinator CMS/LCG integration Task Force leader.

LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.

The GridPP DIRAC project DIRAC for non-LHC communities.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operational model: Roles and functions.

WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.

SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.

Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.

1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

SRM 2.2: experiment requirements, status and deployment plans 6 th March 2007 Flavia Donno, INFN and IT/GD, CERN.

SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,

VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -

EGEE-II INFSO-RI Enabling Grids for E-sciencE WLCG File Transfer Service Sophie Lemaitre – Gavin Mccance Joint EGEE and OSG Workshop.

INFSO-RI Enabling Grids for E-sciencE FTS Administrators Tutorial for Tier-2s Paolo Badino

Vendredi 27 avril 2007 Management of ATLAS CC-IN2P3 Specificities, issues and advice.

SRM v2.2: service availability testing and monitoring SRM v2.2 deployment Workshop - Edinburgh, UK November 2007 Flavia Donno IT/GD, CERN.

Baseline Services Group Status of File Transfer Service discussions Storage Management Workshop 6 th April 2005 Ian Bird IT/GD.

Jean-Philippe Baud, IT-GD, CERN November 2007

ATLAS Use and Experience of FTS

gLite Data management system overview

Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007

SRM2 Migration Strategy

Data Management cluster summary

lundi 25 février 2019 FTS configuration

Presentation transcript:

Summary of issues and questions raised

FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the issues, questions and missing features…

FTS workshop for experiment integrators CMS  Which network model ?  Store-and-forward or direct transfer  Single most noteworthy lacking features:  checksum (or at list file size) check after transfer  (file size already checked, but no check-summing yet)  Other desiderata:  FTS stays in “waiting” even if file already exists at destination and SRM.put fails  DISCUSS retry policy  Would like to have full timing info for each transfer: request, start, end

FTS workshop for experiment integrators Alice  Transparent use  ALICE requires the automatic discovery of the FTS Endpoints and the names of the FTS proxies servers through the information system  An upper layer able to hide the transfers among the different SRM  Tool for traffic monitoring  Possibly R-GMA?  Closer coordination of FTS and LCG2 releases  Updating the clients on VO-Box by hand is a pain  Things not needed soon (or at all?):  Catalog update (handled externally)  LFN resolution (handled externally)  Tx-Ty predefined channels not needed [ Transparent use? ]  Deal in terms of site name or SRM names

FTS workshop for experiment integrators LHCb  Pre-staging needed: timeout period of FTS Transfer Agent  ‘Failed on SRM get: SRM getRequestStatus timed out on get’  Currently LHCb pre-stage externally.  Different behaviour of dCache/Castor  On file overwriting? Can we do better? Overwrite flag? Wait for SRMv2?  Related problem in SRM v1: No single method to perform physical removal  advisoryDelete problem  Better retry  Don’t retry ‘lost-cause’ files e.g. source isn’t there

FTS workshop for experiment integrators LHCb  Plugins  Interested in LFC plug-in (can share some experience on this!)  T1 to T1 channel use-case  Ideally full connected mesh over ~6 T1 sites  T2 to T0 mentioned as well  DISCUSS model  Transparency:  Central service to submit  Road-map suggested: what can we do now

FTS workshop for experiment integrators ATLAS  ReplicaVerifier  Done already by FTS(?)  Staging  Find out stage status of file  Plug-ins:  Catalog interactions – not just grid catalog – multiple catalog updates  Zipping file plug-ins  Call-backs to avoid polling  Retry policy and Hold states  Can FTS retry more! (except ‘permanent-error’ jobs)  Priorities  Already done: maybe allow high-pri submit for VO manager as well  TierX to TierY transfers handled by the network fabric, so channels between all sites should exist  Routing… data planner.  Bandwidth monitoring  Error reporting issues (who do we call / mail?)

FTS workshop for experiment integrators Other things  All experiments poll  Can we use the statistics from FTS to help with this – reduce the polling time?  Call backs?  Retry logic and Hold state  We should discuss the model here  VO manager role  Who gets to modify what?  Interface stability  How often is it OK to make changes that may break things

FTS workshop for experiment integrators Summary of issues / features  Work our retry logic and Hold states  Pre-staging  Configurable to be on or off  “Central service” or equivalent: (data planner and scheduler)  LHCb suggested road-map: client tool first  VO customisation (plug-ins)  LFC interaction  Other catalog interactions  Pre/post steps  Agree on convention for this  How to handle T1 to T1  Also T2 to T2 and T2 to T0?  Bandwidth monitoring  Better channel monitoring  What is the model  Store and forward

FTS workshop for experiment integrators Summary of issues / features  Fix advisoryDelete problem  Deletion  Full transfer history  Timing information + number of retries summary  Check-summing and validation  Simple file size check.. Check-summing harder via SRM  Interface stability  Frequency of update?  Process:  Just who do we when it goes wrong?