GridPP Storage perspectives

Slides:

Advertisements

Similar presentations

User Board - Supporting Other Experiments Stephen Burke, RAL pp Glenn Patrick.

Advertisements

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.

PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.

GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.

Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.

1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.

WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.

Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.

The GridPP DIRAC project DIRAC for non-LHC communities.

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Development of a Tier-1 computing cluster at National Research Centre 'Kurchatov Institute' Igor Tkachenko on behalf of the NRC-KI Tier-1 team National.

Wahid Bhimji (Some slides are stolen from Markus Schulz’s presentation to WLCG MB on 19 June Apologies to those who have seen some of this before)

The GridPP DIRAC project DIRAC for non-LHC communities.

J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.

First Experiences with Ceph on the WLCG Grid Rob Appleyard Shaun de Witt, James Adams, Brian Davies.

An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)

EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.

UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.

PanDA & Networking Kaushik De Univ. of Texas at Arlington UM July 31, 2013.

HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.

J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen

Tier-1 Data Storage Challenges Extreme Data Workshop Andrew Sansum 20 th April 2012.

Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.

Kilian Schwarz ALICE Computing Meeting GSI, October 7, 2009

Dynamic Extension of the INFN Tier-1 on external resources

WLCG IPv6 deployment strategy

Review of the WLCG experiments compute plans

LHCb distributed computing during the LHC Runs 1,2 and 3

WLCG Workshop 2017 [Manchester] Operations Session Summary

DPM at ATLAS sites and testbeds in Italy

London Tier-2 Quarter Owen Maroney

Tom Byrne, Bruno Canning

Pete Gronbech GridPP Project Manager April 2016

LCG Service Challenge: Planning and Milestones

Managing Storage in a (large) Grid data center

Future of WAN Access in ATLAS

gLite->EMI2/UMD2 transition

WLCG Manchester Report

HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL

Storage Interfaces and Access: Introduction

dCache – protocol developments and plans

Introduction to Data Management in EGI

Update on Plan for KISTI-GSDC

Status and Prospects of The LHC Experiments Computing

Taming the protocol zoo

SRM2 Migration Strategy

Luca dell’Agnello INFN-CNAF

UK GridPP Tier-1/A Centre at CLRC

Deployment of IPv6-only CPU on WLCG – an update from the HEPiX IPv6 WG

UK Status and Plans Scientific Computing Forum 27th Oct 2017

A Messaging Infrastructure for WLCG

Network Requirements Javier Orellana

Monitoring at a Multi-Site Tier 1

Ákos Frohner EGEE'08 September 2008

The INFN Tier-1 Storage Implementation

Data Management cluster summary

ATLAS STEP09 UK T2 Activity

Grid Canada Testbed using HEP applications

R. Graciani for LHCb Mumbay, Feb 2006

LHC Data Analysis using a worldwide computing grid

Presentation transcript:

GridPP Storage perspectives Brian Davies pre-GDB Data Management , CERN 13 September 2016

Areas of interest of GridPP Storage ATLAS analysis for new schema. Volumes Network needs/slot capacities T2 Evolution What changes happen to those SEs which remain as well as those which remain? Structure? Testing particular areas of interest (Non)-LHC VO support LFC,FTS(dev/ops),WMS, perfSONAR/TCP tuning (cloud/VO support) T1 activity to put Grid middleware on CEPH

UK ATLAS T2 evolvement use case Site Maximal Slot Count Normal Slot Count Pledged Storage (TB) Actual (TB) Additional slots Factor increase Glasgow 4500 4000 1305 2583 2100 1.53 QMUL 4300 3000 1386 2585 1.70 Lancaster 3100 2500 1229 1700 1.68 Manchester 3900 1080 1750 1400 1.47 Σ Main T2s 15800 12500 5000 9018 RHUL 3400 2000 963 1418 ECDF 600 400 540 891 Oxford 1300 800 729 823 RALPP 1200 360 757 L'pool 1000 486 735 Sheffield 760 387 429 B'Ham 200 189 335 Cambridge 250 207 321 Sussex 100 45 50 Durham 2200 44 Brunel 1150 33 UCL 60 10 81 19 Imperial Σ Small T2s 15050 7370 3987 5874 RAL T1 11400 8000 5300 1 Site need to accommodate additional capacities Large increase in WAN traffic 90% LAN traffic is WN 6/15GB/s of WN traffic at T1 is for ATLAS Jobs Disk servers/ head nodes to cope with additional connections CMS use ~2.5MB/s per job Storage pledge meet What to do with 5874 TB of “free” space

T2 Evolution Context VOs want to use fewer storage endpoints Fewer sites want to support storage Reduction in funded manpower Sites need to run with fewer people. (And need to transition with fewer people, too, as decline has already started) Sites also need to work well with non-LHC assumptions, techniques. https://twiki.cern.ch/twiki/bin/view/LCG/WLCGSiteSurveyT2s

T2 Storage Characteristics Churn (cache like?) but low reuse rate Large Size variation - 3PB down to <500TB Access methods SRM, Xroot, GridFTP, HTTP… X509+VOMS authentication Total Capacity Unlikely that total demand for T2 storage will fall. What do we do with existing storage? Allow to "slowly degrade" at T2Cs? "Physically consolidate" at T2Ds?

Reduce Expenditure by: Reliability Requirements Features Maintenance Manpower

Expenditure Method Types: Reduce Reliability Requirements Features Maintenance Manpower Egalitarian Barebones Network Caching/Read only

Some existing solutions..

Data CVMFS Caching / Read only Read only data provision, optimised for small, slowly changing datasets. CVMFS + (Xrootd backhaul) + [future cache tiers] Deployed for LIGO US by OSG - tested in production. Some exploration at RAL for non-LHC VOs Not suitable for rapidly changing, large datasets. Easy transition (for all sites with CVMFS up to date…) Caching / Read only

T2C Caching WLCG Data Caching Group. Two solutions: Xrootd - mature, not widely used?, compatible with xrootd federation "DPM" - In development, protocol agnostic, but DPM specific. Easy transition (for sites with DPM or Xrootd services) Caching / Read only

ARC Cache ARC "prefetch" cache for jobs Working happily at Durham for some months now. Easy transition (for sites with ARC CEs) Caching / Read only

Cache questions None of the mentioned solutions solve job output issues. Centralised object store for log files? Is caching efficient? Small scale studies suggest 90% of data is read maybe twice, at best. Caching as an proxy for acceptance of low reliability? How is this different from redirection, in practice? Caching / Read only

Pure network model UCL CPU paired to remote SE at QMUL. Original localSE model from 10+ years ago. Can this apply to all sites/Regions Tests by Alastair Dewhurst against Oxford T2 for ATLAS for more sophisticated (federated) solutions. VOs already federating so use their frameworks Currently resolving HammerCloud's assumptions re data locality. Obviously, only test ATLAS workloads… Network

Simpler storage Current T2 storage is specialist - big RAID-6 arrays. Work by M.Ebert on ZFS, software 'RAID': faster, more flexible, more extensible than HW RAID small efficiency gains via compression (~4%) Small step (server local changes). (Also heads off RAID6 scaling limit @ 8TB disks) Egalitarian

SRM retirement? Barebones Egalitarian but spacetokens still an issue. Slow movement on this, due to existing dependencies. Agreement "in principle" that not necessary for T2 workflows but spacetokens still an issue. Still no good, general-purpose alternative for protecting/guaranteeing VO space. Barebones Egalitarian

Single protocol? Barebones Egalitarian HTTP(S)/WebDAV? HTTP Deployment taskforce signed off on features XROOT? [or xrootd] "unusual", but xrootd also supports HTTP(S) GridFTP? Advantage(?) of "Globus Connect" compatibility Barebones Egalitarian

SE "retirement" Without SRM, what is the need for a traditional SE? Revert to Classic SEs Xrootd, HTTP, GridFTP endpoints in front of: ceph/hdfs/other distributed storage [~posix] Object storage via [xrootd/S3/SWIFT/…] Transition is hard. (Unless we can dump all of the data at a site…) Unless churn is used to our advantage ATLAS delete 2PB/month from UK T2s (is this normal?) Barebones Egalitarian

Support for non-LHC VOs LFC, FTS, WMS, DIRAC usage for non-LHC VOs non negligible Support for products… Sharing technology/monitoring/hardware useful New communities require alternative technologies Licensing issues

Tier 1 storage evolution CERN switched from Castor to EOS for disk only storage several years ago. RAL have been working on a replacement storage service. Why not choose DPM/dCache? Grid storage not popular with non-LHC users. They want things like S3. Other groups at RAL are also using Ceph so expertise readily available. Ceph is an object store: Fits LHC use case well (external file catalogues, don’t update files) Want to take advantage of this simplicity Need to add support for XrootD and GridFTP

GridFTP and XrootD for Echo CERN have developed Ceph plugin for xrootd Being used for the Castor tape buffers. Significant work now focusing on performance improvements. RAL are developing GridFTP plugin for Ceph With help from CERN and Brian Bockelman. Most functionality in place, starting looking at performance. Multiple stream assembly planned to improve GridFTP performance via FTS.

Echo Status No SRM – is that a problem? The Castor SRM will deal with all the tape stuff. Currently just support ATLAS and CMS who claim to not need one!! Other VOs to follow. Accounting will be provide via .json file Echo is now registered as a ‘T3D’ for ATLAS. Starting to add SE and run functional tests Andrew Lahiff is validating CMS jobs. Waiting for GridFTP improvements before adding to PhEDEx Aim to have production quality SE by April 2017.

Future areas of interest AAI/AARC EUDAT/SAGE/ESiWACE project collaboration/co-ordination Further non-HEP community support

Summary GridPP supporting/following various storage topics. Balance LHC/nonLHC requirements T2 evolution on track T1 (r)evolution progressing But Tape staying the Same Collaboration between LHC/nonLHC beneficial to both parties

Questions Thanks to A.Dewhurst and S.Skipsey for additional/majority material for this talk. Thanks to gridpp-storage/GridPP community( particulary J.Jensen and M.Ebert ) gridpp-storage@jiscmail.ac.uk

Backups

DYNaFED

Network

Network