Www.kit.edu KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Xrootd SE deployment at GridKa WLCG.

Slides:

Advertisements

Similar presentations

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Jos van Wezel Doris Ressmann GridKa, Karlsruhe TSM as tape storage backend for disk pool managers.

Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.

Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.

Duke and ANL ASC Tier 3 (stand alone Tier 3’s) Doug Benjamin Duke University.

Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.

Lesson 1: Configuring Network Load Balancing

Experiences Deploying Xrootd at RAL Chris Brew (RAL)

Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York

Xrootd, XrootdFS and BeStMan Wei Yang US ATALS Tier 3 meeting, ANL 1.

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

SLAC Experience on Bestman and Xrootd Storage Wei Yang Alex Sim US ATLAS Tier2/Tier3 meeting at Univ. of Chicago Aug 19-20,

D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Hadoop on HEPiX storage test bed at FZK Artem Trunov.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.

US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.

Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.

Evolution of storage and data management Ian Bird GDB: 12 th May 2010.

High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.

The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association STEINBUCH CENTRE FOR COMPUTING - SCC

Storage Classes report GDB Oct Artem Trunov

Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,

Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Mario Reale – GARR NetJobs: Network Monitoring Using Grid Jobs.

DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.

Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team

Gestion des jobs grille CMS and Alice Artem Trunov CMS and Alice support.

New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,

OSG STORAGE OVERVIEW Tanya Levshina. Talk Outline  OSG Storage architecture  OSG Storage software  VDT cache  BeStMan  dCache  DFS:  SRM Clients.

A. Sim, CRD, L B N L 1 Production Data Management Workshop, Mar. 3, 2009 BeStMan and Xrootd Alex Sim Scientific Data Management Research Group Computational.

Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.

BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.

Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

Federating Data in the ALICE Experiment

a brief summary for users

Jean-Philippe Baud, IT-GD, CERN November 2007

Dynamic Extension of the INFN Tier-1 on external resources

WLCG IPv6 deployment strategy

Sviluppi in ambito WLCG Highlights

Global Data Access – View from the Tier 2

StoRM: a SRM solution for disk based storage systems

Vincenzo Spinoso EGI.eu/INFN

Blueprint of Persistent Infrastructure as a Service

Report of Dubna discussion

Service Challenge 3 CERN

Berkeley Storage Manager (BeStMan)

Artem Trunov for ALICE visit to FZK Jul

Update on Plan for KISTI-GSDC

Model (CMS) T2 setup for end users

Luca dell’Agnello INFN-CNAF

Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.

Artem Trunov and EKP team EPK – Uni Karlsruhe

Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.

Artem Trunov, Günter Quast EKP – Uni Karlsruhe

Brookhaven National Laboratory Storage service Group Hironori Ito

ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010

Ákos Frohner EGEE'08 September 2008

The INFN Tier-1 Storage Implementation

The LHCb Computing Data Challenge DC06

Presentation transcript:

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Xrootd SE deployment at GridKa WLCG Tier 1 site Artem Trunov Karlsruhe Institute of Technology

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, LHC Data Flow Illustrated photo courtesy Daniel Wang

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, WLCG Data Flow Illustrated

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, ALICE Xrootd SE at GridKa Xrootd is ideployed at GridKa since 2002 for BaBar Xrootd has been requested by ALICE, proposal approved by GridKa Technical Advisory Board. The solution aims at implementing ALICE use cases for archiving custodial data at GridKa T1 center using Xrootd SE: Transfer of RAW, ESD data from CERN Serving RAW data to reprocessing jobs Receiving custodial reprocessed ESD, AOD data from WNs Archiving custodial RAW, ESD data on tapes Recalling RAW data from tape for reprocessing. GridKa special focus: Low maintenance solution SRM contingency (in case ALICE will need it) Deployment timeline Nov 2008: Xrootd disk-only SE, 320 TB Sep 2009: Xrootd SE with tape backend + srm, 480 TB Nov 2009 – ALICE uses xrootd exclusively July – Oct 2010: Expansion up to 2100TB of total space

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Storage setup at GridKa Clusters of 2 or more servers with direct- or SAN-attached storage GPFS local file systems Not global, not across all gridka servers and WNs All servers in cluster see all the storage pools Redundancy Data is accessible if one server fails Most of data is behind servers with 10G NICs Plus some older 2x1G File server FC Storage File server Disk SAN FC Storage …

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Storage setup at GridKa + xrootd Xrootd maps well onto this setup Client can access a file via all servers in a cluster Redundant data path Xrootd server

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Storage setup at GridKa + xrootd Xrootd maps well onto this setup Client can access a file via all servers in a cluster Redundant data path On a server failure a client will be redirected to the other one Xrootd server

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Storage setup at GridKa + xrootd Xrootd maps well onto this setup Client can access a file via all servers in a cluster Redundant data path On a server failure a client will be redirected to the other one Scalability Automatic load balancing All xrootds can serve the same (“hot”) file to different clients Xrootd server

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Admin nodes redundancy Xrootd allows to have two admin nodes (“redirectors”, “managers”) in a redundant configuration Support is build into ROOT client User has one access address, which has two IP addresses (DNS type-a records) Clients choose one of managers randomly Load balancing Failure of one redirector is recognized by a ROOT client and second address is tried. Two admin nodes – twice the transactional throughput Xrootd server Xrootd manager Dns alias (a-record) for two redirectors

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, High-Availability Any component can fail at anytime and the failure doesn’t result in reduced uptime or inaccessible data Maintenance can be done without taking the whole SE offline, without announcing downtime Rolling upgrades, one server at a time Real case A server failed on Thursday evening. A failed component was replaced on Monday VO didn’t notice anything. Site admins and engineers appreciated that such cases could be handled without emergency

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Adding tape backend MSS backend is a part of vanilla Xrootd since Day 1 Policy-based migration daemon Policy-based purging daemon (aka garbage-collector) Prestaging daemon Migration and purging queues Also with 2-level priority On-demand stage-in While user’s job wait on file open Bulk “bring-online” requests Async. notification of completed requests via UDP Current “mps” scripts are being rewritten by Andy File Residency Manager, “frm” Adapting to site’s MSS: Need to write own glue scripts Stat command Transfer command GridKa uses TSM and in-house middelware to control migration and recall queues, written by Jos van Wezel. The same mechanism used by dCache ALICE implements it’s own backend to fetch missing files from any other ALICE site. Called vMSS – “virtual” Files are located via a global redirector

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Adding tape backend - details One of the nodes in a GPFS cluster is connected to tape drives via SAN This node migrates all files from the GPFS cluster Reduces the number of tape mounts Recalls all files and evenly distributes across all GPFS file systems in the cluster. Xrootd server Disk SAN FC Storage Tape SAN FC Tape drives +Migration and staging daemons; TSS

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Tape backend and ALICE global federation One of the nodes in a GPFS cluster is connected to tape drives via SAN This node migrates all files from the GPFS cluster Reduces the number of tape mounts Recalls all files and evenly distributes across all GPFS file systems in the cluster. When a file is not on disk, it will be looked in the ALICE cloud AND then in the local tape archive Thus can avoid recall from tape and fetch files over the network. Still need to test this Subject of VO policy as well. Xrootd server Disk SAN FC Storage Tape SAN FC Tape drives +Migration and staging daemons; TSS Xrootd server FC Tape drives +Migration and staging daemons; TSS ALICE global redirector

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Adding grid functionality SRM interface is needed to ensure WLCG requirement. Adapted OSG solution Could not use OSG releases out-of-the box. But the components are all the same Xrootd Globus gridftp + posix dsi backend + xrootd posix preload library xrootdfs BeStMan SRM in a gateway mode Clients srm:// root:// gsiftp:// BeStMan SRM gridftp xrootdfs Posix preload library Xrootd Cluster root://

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Xrootd SE at Gridka – details of own development Xrootd distro From CERN (Fabrizio) – the official ALICE distro Installed and configured according to ALICE wiki page – little or no deviations. Gridftp Used VDT rpms Got gridftp posix lib out of OSG distro made rpm Installed host certificates, CAs, gridmap-file from gLite Made sure that the DN used for transfer is mapped to a static account (not a group.alice) Wrote own startup script Run as root SRM Got xrootdfs source from SLAC Made rpm, own startup script Installed fuse libs with yum Got BeStMan tar file from their web site Run both as root (you don’t have to) Published in BDII manually (static files)

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, High-Availability and Performance for SRM service Gridftp is run in split-process mode Front-end node is co-located with BeStMan Can run in user space Datanode is co-located with xrootd server Doesn’t need a host certificate Allows to optimize host’s network settings for low latency vs. high throuput. BeStMan SRM Gridftp Front-end xrootdfs Posix preload library Xrootd Cluster root:// Gridftp control Gridftp backend Xrootd server “SRM admin node”

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, High-Availability and Performance for SRM service Gridftp is run in split-process mode Front-end node is co-located with BeStMan Can run in user space Datanode is co-located with xrootd server Doesn’t need a host certificate Allows to optimize host’s network settings for low latency vs. high throughput. Bestman and gridftp instances can run under a DNS alias Scalable performance BeStMan SRM Gridftp Front-end xrootdfs Posix preload library Xrootd Cluster root:// Gridftp control Gridftp backend Xrootd server BeStMan SRM Gridftp Front-end xrootdfs “SRM admin nodes” Under DNS alias Posix preload library Gridftp backend Xrootd server Gridftp control root:// … …

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, More High-Availability All admin nodes are Virtual Machines xrootd managers SRM admin node BeStMan + gridftp control 1 in production at GridKa, but nothing prevents from having 2 KVM on SL5.5 Shared GPFS for VM images VMS can be restarted on the other node in case of failure Xrootd server Xrootd server gridftp-data Xrootd server Xrootd server gridftp-data Xrootd manager SRM, gridftp-co Xrootd manager SRM, gridftp-co VM hypervisors

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Performance No performance measurements on deployed infrastructures Only observations ALICE production transfers from CERN using xrd3cp ~450MB/s into three servers, ~70TB in two days ALICE analysis, root:// on LAN Up to 600MB/s from two older servers. Migration to tape 300 MB/s sustained over days from one server using max 4 drives

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Import of RAW data from CERN with xrd3cp

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Migration to tape

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Problems Grid authorization BeStMan works with GUMS or plain gridmapfile. Only static account mapping, no group pool accounts Looking forward for future interoperability between auth. tools BeStMan 2, ARGUS, SCAS, GUMS

KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Artem Trunov WLCG Storage Jamboree Amsterdam, Summary ALICE is happy Second largest ALICE SE after CERN In both allocated and used space 1.3 PB deployed, up to 2.1PB in the queue (20% of GridKa 2010 storage) Stateless, scalable Low maintenance But good deal of integration efforts SRM frontend and tape backend No single point of failure