Download presentation
Presentation is loading. Please wait.
Published byGriselda Curtis Modified over 8 years ago
1
www.kit.edu KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Xrootd SE deployment at GridKa WLCG Tier 1 site Artem Trunov Karlsruhe Institute of Technology Artem.trunov@kit.edu
2
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 2 LHC Data Flow Illustrated photo courtesy Daniel Wang
3
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 3 WLCG Data Flow Illustrated
4
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 4 ALICE Xrootd SE at GridKa Xrootd is ideployed at GridKa since 2002 for BaBar Xrootd has been requested by ALICE, proposal approved by GridKa Technical Advisory Board. The solution aims at implementing ALICE use cases for archiving custodial data at GridKa T1 center using Xrootd SE: Transfer of RAW, ESD data from CERN Serving RAW data to reprocessing jobs Receiving custodial reprocessed ESD, AOD data from WNs Archiving custodial RAW, ESD data on tapes Recalling RAW data from tape for reprocessing. GridKa special focus: Low maintenance solution SRM contingency (in case ALICE will need it) Deployment timeline Nov 2008: Xrootd disk-only SE, 320 TB Sep 2009: Xrootd SE with tape backend + srm, 480 TB Nov 2009 – ALICE uses xrootd exclusively July – Oct 2010: Expansion up to 2100TB of total space
5
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 5 Storage setup at GridKa Clusters of 2 or more servers with direct- or SAN-attached storage GPFS local file systems Not global, not across all gridka servers and WNs All servers in cluster see all the storage pools Redundancy Data is accessible if one server fails Most of data is behind servers with 10G NICs Plus some older 2x1G File server FC Storage File server Disk SAN FC Storage …
6
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 6 Storage setup at GridKa + xrootd Xrootd maps well onto this setup Client can access a file via all servers in a cluster Redundant data path Xrootd server
7
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 7 Storage setup at GridKa + xrootd Xrootd maps well onto this setup Client can access a file via all servers in a cluster Redundant data path On a server failure a client will be redirected to the other one Xrootd server
8
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 8 Storage setup at GridKa + xrootd Xrootd maps well onto this setup Client can access a file via all servers in a cluster Redundant data path On a server failure a client will be redirected to the other one Scalability Automatic load balancing All xrootds can serve the same (“hot”) file to different clients Xrootd server
9
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 9 Admin nodes redundancy Xrootd allows to have two admin nodes (“redirectors”, “managers”) in a redundant configuration Support is build into ROOT client User has one access address, which has two IP addresses (DNS type-a records) Clients choose one of managers randomly Load balancing Failure of one redirector is recognized by a ROOT client and second address is tried. Two admin nodes – twice the transactional throughput Xrootd server Xrootd manager Dns alias (a-record) for two redirectors
10
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 10 High-Availability Any component can fail at anytime and the failure doesn’t result in reduced uptime or inaccessible data Maintenance can be done without taking the whole SE offline, without announcing downtime Rolling upgrades, one server at a time Real case A server failed on Thursday evening. A failed component was replaced on Monday VO didn’t notice anything. Site admins and engineers appreciated that such cases could be handled without emergency
11
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 11 Adding tape backend MSS backend is a part of vanilla Xrootd since Day 1 Policy-based migration daemon Policy-based purging daemon (aka garbage-collector) Prestaging daemon Migration and purging queues Also with 2-level priority On-demand stage-in While user’s job wait on file open Bulk “bring-online” requests Async. notification of completed requests via UDP Current “mps” scripts are being rewritten by Andy File Residency Manager, “frm” Adapting to site’s MSS: Need to write own glue scripts Stat command Transfer command GridKa uses TSM and in-house middelware to control migration and recall queues, written by Jos van Wezel. The same mechanism used by dCache ALICE implements it’s own backend to fetch missing files from any other ALICE site. Called vMSS – “virtual” Files are located via a global redirector
12
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 12 Adding tape backend - details One of the nodes in a GPFS cluster is connected to tape drives via SAN This node migrates all files from the GPFS cluster Reduces the number of tape mounts Recalls all files and evenly distributes across all GPFS file systems in the cluster. Xrootd server Disk SAN FC Storage Tape SAN FC Tape drives +Migration and staging daemons; TSS
13
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 13 Tape backend and ALICE global federation One of the nodes in a GPFS cluster is connected to tape drives via SAN This node migrates all files from the GPFS cluster Reduces the number of tape mounts Recalls all files and evenly distributes across all GPFS file systems in the cluster. When a file is not on disk, it will be looked in the ALICE cloud AND then in the local tape archive Thus can avoid recall from tape and fetch files over the network. Still need to test this Subject of VO policy as well. Xrootd server Disk SAN FC Storage Tape SAN FC Tape drives +Migration and staging daemons; TSS Xrootd server FC Tape drives +Migration and staging daemons; TSS ALICE global redirector
14
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 14 Adding grid functionality SRM interface is needed to ensure WLCG requirement. Adapted OSG solution Could not use OSG releases out-of-the box. But the components are all the same Xrootd Globus gridftp + posix dsi backend + xrootd posix preload library xrootdfs BeStMan SRM in a gateway mode Clients srm:// root:// gsiftp:// BeStMan SRM gridftp xrootdfs Posix preload library Xrootd Cluster root://
15
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 15 Xrootd SE at Gridka – details of own development Xrootd distro From CERN (Fabrizio) – the official ALICE distro Installed and configured according to ALICE wiki page – little or no deviations. Gridftp Used VDT rpms Got gridftp posix lib out of OSG distro made rpm Installed host certificates, CAs, gridmap-file from gLite Made sure that the DN used for transfer is mapped to a static account (not a group.alice) Wrote own startup script Run as root SRM Got xrootdfs source from SLAC Made rpm, own startup script Installed fuse libs with yum Got BeStMan tar file from their web site Run both as root (you don’t have to) Published in BDII manually (static files)
16
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 16 High-Availability and Performance for SRM service Gridftp is run in split-process mode Front-end node is co-located with BeStMan Can run in user space Datanode is co-located with xrootd server Doesn’t need a host certificate Allows to optimize host’s network settings for low latency vs. high throuput. BeStMan SRM Gridftp Front-end xrootdfs Posix preload library Xrootd Cluster root:// Gridftp control Gridftp backend Xrootd server “SRM admin node”
17
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 17 High-Availability and Performance for SRM service Gridftp is run in split-process mode Front-end node is co-located with BeStMan Can run in user space Datanode is co-located with xrootd server Doesn’t need a host certificate Allows to optimize host’s network settings for low latency vs. high throughput. Bestman and gridftp instances can run under a DNS alias Scalable performance BeStMan SRM Gridftp Front-end xrootdfs Posix preload library Xrootd Cluster root:// Gridftp control Gridftp backend Xrootd server BeStMan SRM Gridftp Front-end xrootdfs “SRM admin nodes” Under DNS alias Posix preload library Gridftp backend Xrootd server Gridftp control root:// … …
18
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 18 More High-Availability All admin nodes are Virtual Machines xrootd managers SRM admin node BeStMan + gridftp control 1 in production at GridKa, but nothing prevents from having 2 KVM on SL5.5 Shared GPFS for VM images VMS can be restarted on the other node in case of failure Xrootd server Xrootd server gridftp-data Xrootd server Xrootd server gridftp-data Xrootd manager SRM, gridftp-co Xrootd manager SRM, gridftp-co VM hypervisors
19
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 19 Performance No performance measurements on deployed infrastructures Only observations ALICE production transfers from CERN using xrd3cp ~450MB/s into three servers, ~70TB in two days ALICE analysis, root:// on LAN Up to 600MB/s from two older servers. Migration to tape 300 MB/s sustained over days from one server using max 4 drives
20
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 20 Import of RAW data from CERN with xrd3cp
21
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 21 Migration to tape
22
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 22 Problems Grid authorization BeStMan works with GUMS or plain gridmapfile. Only static account mapping, no group pool accounts Looking forward for future interoperability between auth. tools BeStMan 2, ARGUS, SCAS, GUMS
23
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, 2010 23 Summary ALICE is happy Second largest ALICE SE after CERN In both allocated and used space 1.3 PB deployed, up to 2.1PB in the queue (20% of GridKa 2010 storage) Stateless, scalable Low maintenance But good deal of integration efforts SRM frontend and tape backend No single point of failure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.