INFSO-RI Enabling Grids for E-sciencE File Transfer Service Patricia Mendez Lorenzo CERN (IT-GD) / CNAF Tier 2 INFN - SC3 Meeting Bari 26 th -27 th May
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Outline General Aspects of the Service Description of the service Components and Architecture Job and File States Tier 0 / Tier 1 / Tier 2 Configurations Models and Test-beds and testing Summary
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo General Aspects of the Service Principal Challenge of this service: Prototype the data movement service needed for LHC ➸ Generalities (1): In terms of Architecture The elements used in this service already existing but never interconnected LCG and gLite provided and Architecture and Design document for this service Based on the Radiant architecture already used in SC1 and SC2 Good performance during SC2 FTS uses the same architecture and design and it is proposed as the prototype to be used in SC3 The software proposed for SC3 is interoperable with that used on SC2
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo General Aspects of the Service ➸ Generalities (2): In terms of Setup and Service Setup phase Starting 1 st of July 2005 Preparations going on Includes throughput test Maintenance for one week an average throughput of 400 MB/s from disk (CERN) to tape (Tier-1) 150 MB/s disk (CERN) -> disk (Tier-1) 60 MB/s disk (CERN) -> tape (Tier-1) CERN able to support 1GB/s
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo General Aspects of the Service Service Phase Stable operation phase Includes additional software components WMS Grid Catalog Mass Storage Management Services VOMS Stating September until the end of 2005 Including real experiment data CMS and Alice at the beginning ATLAS and LHCb joining in October/November
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo General Aspects of the Service Sites participating Tier 0: CERN Tier 1: ASCC, BNL, CCIN2P3, CNAF, FNAL, GridKA, NIKHEF/SARA, TRIUMF Nordic Grid and PIC for file transfers but not yet committed in the Throughput Phase Tier 2 Decided in agreement with Tier 1 Tier2Tier 1Exper. LegnaroCNAFCMS MilanCNAFATLAS TurinCNAFAlice DESYFZKATLAS CMS LancasterRALATLAS ImperialRALCMS EdinburghRALLHCb US Tier2sBNL/FNALATLAS CMS
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Description of the Service It is a lowest level data movement service Responsible for moving sets of files from one site to another one in a reliable way and that’s all Designed for point to point movement of physical files (SURL to SURL) No deal with GUID, LFN, Dataset, etc. These concepts can be dealt with higher level services (Catalogs, routings) It observes a pluggable agent-based architecture where higher level tools can be joined The VOs can provide their own agents for specific operations gLite will provide the agents to integrate the service with the rest of the gLite middleware Introduction of the experiments frameworks
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Components and Architecture User Interface Node (Client) Agent Node PostTransferAgent PreTransferAgent Movement Scheduler Scheduler Movement Client Statistics Gatherer Request Store Data Server Transport Server Data Server Transport Server Agent Node PostTransfer Agent creates gets jobs from communication send data receive data Stores all Requests submitted t o the system Takes off the requests from the queue and initializes the Movement Client Provides a central storing the system performance Executes the transfer and monitorizes it till the end Responsible of the transfer Extension of the system Web Interface
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Architectural Considerations Channel : Movements paths in the Request Store: the point to point network connection Client Tier 1 Server Client Tier 1 Server Tier 0 Server Client Tier 2 Client Tier 2 Dedicated pipes Not dedicated pipes The transfer is executed by jobs sent to the system just explained Once the job has been submitted a channel is assigned
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Job States Overall progress of the job through the system Pending Active Cancelled FailedDone [recoverable transfer failure] [ job cancelled && recoverable transfer failure ] [permanent transfer failure] [transfer succeeded] start transfer [job cancelled] Once the job has been submitted Job assigned to the channel Job has been cancelled Files currently being transferred One or more files failed All files transferred
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo File States Job just submitted Job owning this file hast not a channel File currently being transferred File has failed once and is a canditate for retry Job containing the file has been cancelled File successfully transferred
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Interactions with other Services Suitable SRM clusters have to be deployed at source and destination of the pipe FTS needs MyProxy The client has to upload an appropriate credential into the same MyProxy serer used by FTS The same credential is used for all client’s transfers The maximum validity of the credential should be long enough to finish the last transfer, taking into account queuing time
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Single and Multiple Channel Multiple Channel
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo If you want to run a server It is the case of Tier-0 and Tier-1 An Oracle database to hold the state A transfer server to run the transfer agents (Movement Scheduler) Agents responsible of the assignment of the jobs to the channels manages by the site Agents responsible of the running of the transfer An application server (tested with Tomcat5) It runs the submission and monitors the portal, actually your open door to the whole system
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Required Resources at the Tier 1 Oracle database account The existing deployment module deploys both portal and transfer server on the same machine Portal+Transfer server: ½ gig memory in WNs machines No special disk resources are needed We still have to see how these machines can scale For better service, splitting the deployment module Transfer server + portal on different machines
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo In order to run a client These is the case of the Tier 0, Tier 1 and Tier 2 Install the command-lines Any configuration file has to be filled BDII (to publish the name of the service server) Possible users Site administrators: status and control of the channels they participate in Production jobs: to move locally created files Experiment software frameworks could submit directly via APIs o the relevant channel portal
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Machines needed for the client No extra nodes will be needed You can use the existing LCG2 UI/WN configuration just installing the corresponding rpms of the client to include the client command lines User credential is needed to be included in MyProxy. The MyProxy clients are therefore recommended
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Initial use models considered Tier-0 to Tier-1 distribution Put server at Tier-0 (proposal) Model used in SC2 Tier-1 to Tier-2 distribution Put server at Tier-1 – push (proposal) Model used in SC2 Tier-2 to Tier-1 upload Put server at Tier-1 – pull (proposal) Other models? Probably for service phase beyond
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Status: APIs User Command API Definitions More useful from the User and applications point of view JAVA APIs available to perform the connections with the service and the job submissions A complete list of APIs, examples and detailed explanations: “EGEE gLite User’s Guide” gLite Flie Transfer Service –JAVA
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Status: CLIs QuickStart Admin: The service starts with no channel defined First thing to do: voms-proxy-init Now a channel has to be added to the system glite-transfer-channel-add -c -f 4 -T 1 -t b S active CERNRAL cern.ch rl.ac.uk You can list all available channels glite-transfer-channel-list ( Output) CERNRAL The details of the channel can be fetched glite-transfer-channel-list CERNRAL ( Output)
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Status: CLIs Channel: CERNRAL Between: cern.ch and rl.ac.uk State: Inactive Contact: Bandwidth: 1000 Nominal throughput: 1000 Number of files: 4, streams: 1 Setting the channel now to active glite-transfer-channel-set -S Active CERNRAL Changing the status of all files belonging to a certain job glite-transfer-channel-signal -c CERNRAL Pending
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Status: CLIs QuickStart User: Again, you need a proxy: voms-proxy-init Upload your credential to the appropriate MyProxy Server (just once) export MYPROXY_SERVER=lxb1010.cern.ch myproxy-init Now a job can be submitted glite-transfer-submit -p myproxypassword \ srm://radiantservice.cern.ch/castor/random/file1 srm://store.rl.ac.uk/castor/random/file1 It will return a UUID to identify the job In the case of complex jobs glite-transfer-submit -p myproxypassword -f List-Of-Files
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Status: CLIs Listing the jobs assigned to a channel in the required state glite-transfer-list -c CERNRAL Pending ( OUTPUT) Request ID: ff47cdd2-c6e0-11d9-877b-ba42ec Status: Pending Channel: CERNRAL Client DN: /C=CH/O=CERN/OU=GRID/CN=Gavin Mccance 1838 Reason: Submit time: :36: Files: 1 Just basic information can be obtaining using the GUID glite-transfer-status ( Output) Pending
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Status: CLIs Finally a job can be cancelled glite-transfer-cancel Complete list of commands, examples and detailed explanations: “EGEE gLite User’s Guide” gLite Flie Transfer Service --CLI
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Test-bed Initial small-scale test setups have been running at CERN during and since SC2 to determine reliability as new versions come out This small test setup will continue to check for possible problems in new versions Expanding test setup as we head to SC3 Allows greater stress testing of software Allows to gain further operational experience and develop operating procedures Allows experiments to get early access to the service to understand how their frameworks can make use of it
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Experiment involvement Schedule experiments onto evaluation setup Some consulting on how to integrate frameworks Discuss with service challenge/development team Already presented ideas at LCG storage management workshop Doing to actual work
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Summary Outline server and client installs Proposed server at Tier-0 and Tier-1 Oracle DB, Tomcat application server, transfer node Propose client tools at T0, T1 and T2 UI/WN installation Evaluation setup Initially at CERN T0, interaction with T1 and T2 Experiments interaction Scheduled technical discussion and work
Enabling Grids for E-sciencE INFSO-RI Bari 26 th -27 th May Patricia Mendez Lorenzo Some documentation Radiant Homepage Some interesting talks File Transfer Service Design 09&document_id=490347&version=3 Additional Documentation 09&document_id=503735&version=0.1 gLite User’s Guide