WLCG Storage planning Andrea Sciabà

Slides:



Advertisements
Similar presentations
HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.
Advertisements

Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
CERN IT Department CH-1211 Genève 23 Switzerland t Plans and Architectural Options for Physics Data Analysis at CERN D. Duellmann, A. Pace.
Your university or experiment logo here NextGen Storage Shaun de Witt (STFC) With Contributions from: James Adams, Rob Appleyard, Ian Collier, Brian Davies,
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
 CASTORFS web page - CASTOR web site - FUSE web site -
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
DPM Python tools Ivan Calvet IT/SDC-ID DPM Workshop 10 th October 2014.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
CERN IT Department CH-1211 Genève 23 Switzerland t HEPiX Conference, ASGC, Taiwan, Oct 20-24, 2008 The CASTOR SRM2 Interface Status and plans.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
The GridPP DIRAC project DIRAC for non-LHC communities.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
Security recommendations DPM Jean-Philippe Baud CERN/IT.
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR and EOS status and plans Giuseppe Lo Presti on behalf.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Storage plans for CERN and for the Tier 0 Alberto Pace (and.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
EGEE Data Management Services
Federating Data in the ALICE Experiment
Markus Schulz - LCG Deployment
CASTOR: possible evolution into the LHC era
Jean-Philippe Baud, IT-GD, CERN November 2007
WLCG IPv6 deployment strategy
DPM at ATLAS sites and testbeds in Italy
Scalable sync-and-share service with dCache
Dynamic Storage Federation based on open protocols
StoRM: a SRM solution for disk based storage systems
Vincenzo Spinoso EGI.eu/INFN
Status of the SRM 2.2 MoU extension
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
dCache “Intro” a layperson perspective Frank Würthwein UCSD
Flavia Donno CERN GSSD Storage Workshop 3 July 2007
dCache – protocol developments and plans
StoRM Architecture and Daemons
dCache Scientific Cloud
Introduction to Data Management in EGI
EGI UMD Storage Software Repository (Mostly former EMI Software)
DPM releases and platforms status
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
DCache things Paul Millar … on behalf of the dCache team.
CTA: CERN Tape Archive Overview and architecture
Data Management cluster summary
LHC Data Analysis using a worldwide computing grid
INFNGRID Workshop – Bari, Italy, October 2004
Summary of the dCache workshop
Presentation transcript:

WLCG Storage planning Andrea Sciabà WLCG Collaboration Workshop 11-13 July 2011 DESY

Outline Global overview of storage systems Storage system deployment at Tier-1 sites Storage upgrade plans at Tier-1 sites Storage development Short/medium term plans Conclusions

Introduction WLCG ensures a proper information flow among the project, the sites and the experiments concerning storage and data management issues The Tier-1 Service Coordination Meeting serves this purpose Installed software versions, recent and scheduled interventions, new middleware releases, incidents are reported and discussed

Storage systems in WLCG ARC: 2 BeStMan: 40 CASTOR: 3 StoRM: 40 dCache: 63 DPM: 215 Xrootd: 6 HDFS: 1 Tier 0+1+2 DPM is still by far the most common Storage Element

Storage systems at the Tier-0/1 sites CASTOR dCache StoRM DPM EOS xrootd Lustre CERN 2.1.11 0.1.0 ASGC 2.1.10 1.8.0-1 BNL 1.9.5-23 CNAF 1.7.0 FNAL 2.9.1 1.8.3 IN2P3-CC 1.9.5-26 KIT NDGF 1.9.12 NL-T1 PIC RAL TRIUMF 1.9.5-21 CASTOR sites have all very recent releases dCache sites with the old golden release, only one with the new

Foreseen upgrades (I) ASGC BNL CNAF FNAL IN2P3 CASTOR ≥ 2.1.11 DPM 1.8.2 (allows to define no. of threads for DPM and SRM servers, better filesystem selection algorithm) Upgrade all network for disk servers to 10 Gb before Q4 BNL migrate to Chimera on August 22 Evaluate data access via NFS 4.1 on dCache and BlueArc CNAF migrate all instances to StoRM 1.7.0-7 when testing completed FNAL new dCache golden release after Heavy Ions run; will not move to Chimera Deploy 400 TB of EOS to move users off dCache resilient pools Add more disk before winter IN2P3 Upgrade non-WLCG dCache instance to new golden release on September 20th and WLCG instance in January

Foreseen upgrades (II) KIT Apply bug fixes as appropriate New dCache golden release by beginning of next year NDGF Closely follow new dCache patches Change authentication to per-DN file ownership and VOMS group ACLs in autumn PIC Migrate to Enstore2 (Jul 26) Migrate to new dCache golden release (Aug 9) RAL SRM 2.11 upgrade (VOMS support) in August/September CASTOR 2.1.11 (no LSF, new tape gateway) in November/December TRIUMF New dCache golden release after (successful) testing and increase disk from 4 to 6 PB

dCache “golden releases” 1.9.12 was released in April Brings several improvements Easier, more flexible configuration Better xrootd and NFS 4.1 support SRM: asynchronous getTURL GLUE 2.0 support Powerful migration module

dCache plans (I) On-demand flush to tape by Operational improvements Moving name space entry from ONLINE directory to CUSTODIAL directory Creation of an internal copy of a file to the CUSTODIAL space, resulting in two name space entries (only one custodial) WebDAV based storage control Operational improvements Random distribution of incoming data and automatic reshuffling of data when new pools are added Easier to upgrade components without service downtimes

dCache plans (II) Support for a 3-tiered industry storage model Layer I: optimized for random I/O: SSD space, NFS4.1/pNFS Layer II: optimized for streaming: high density disk space on HDFS, WAN /IO Layer III: custodial storage: controlled tape access Other stuff Federations: NFS 4.1 (fedFS), HTTP(S) (+GridFTP, xrootd, …) Fully secure HTTP read access to data with standard tools and browsers with X.509 certificates “Amazon”-like browser-based console for file browsing and storage control (e.g. disk ↔ tape) IPv6

BeStMan plans Current version: 2.1.0 Development plans depend on available funding Joint proposal submitted to NSF BeStMan SRM server Additional transfer protocol support BeStMan SRM clients Improvement in data transfer efficiency Data channel caching with GridFTP transfers for a request with many files Determine the transfer parameters from the previous transfer performance E.g. number of concurrent transfer connections, number of parallel streams E.g. FDT (Fast Data Transfer) HTTPS support

CASTOR disk cache evolution Future plans driven by increasing requirements from the experiments E.g. ATLAS trigger at 400 Hz, moving to 600 Hz Recent changes (CASTOR 2.1.11) Transfer Manager, replacing LSF scheduling In production for ATLAS, to be deployed everywhere Basic VOMS support in SRM Future developments (CASTOR 2.1.12 and beyond) Reduce file open latency Garbage collector logic improvements

CASTOR tape interface Recent changes Support for 5TB tapes In production at CERN and at RAL One tape mark per file as opposed to three Future developments (2.1.12 and beyond) Towards improving support for the new tape drive families and their increased throughput Buffered tape marks and multi-file metadata handling Repack re-engineering to address scalability Media migration to start next year 40PB to be moved from 1T to 5T tapes

EOS evolution EOS 0.1.0 EOS 0.2.0 (September/October) release candidate deployed in EOSCMS/EOSATLAS EOS 0.2.0 (September/October) Redundancy CODECs dual parity & Reed-Solomon based give better file availability with less space usage Client bundle (Linux + OSX) user FUSE EOS mounting via KRB5 & X509 Performance improvements in low-level FUSE client EOS 0.x.0 ...some ideas to follow within next year HTTPS/HTTP Gateway Monitoring/Web Interface Layout Migration convert on the fly replica to CODEC files and vice-versa Extended FUSE Mount Daemon move part of the server code into the client (parallel IO + lower MD latency) policy-based persistent file page cache (for Software, Calibration, Geometry files etc...) policy-based file backup into CASTOR LAN/WAN Transfer Interface

StoRM plans Defined in the EMI 2 work plan WebDAV support ARGUS integration

DPM NFS 4.1 / pNFS Now functional beta, monthly releases, can be tried Short term: improve performance (heavy on metadata query), integration with dCache Medium term: X.509 support, investigate federations WebDAV Beta release available for testing Short term: X.509 support in browser, help to improve HTTP support in ROOT, evaluate HTTP as alternative to GridFTP Medium term: help to improve common clients (e.g. curl)

DPM (II) Simpler system administration with Puppet DPM 1.8.2 Strong, widely supported solution first release available to sites Short term extend Nagios performance probes, add probes as needed Add reporting per storage area / virtual organization Medium Term Promote sharing of Puppet manifests among developers and site admins DPM 1.8.2 Better file system selection Improvements in the client connection pools for all daemons GLUE 2.0 support Faster draining of disk servers SEMsg support

SEMsg: SE-catalogue consistency Support integration with experiment frameworks Support deployment of SEMsg-enabled LFC/DPM servers Short term investigate more advanced security features (e.g. whitelisting) Medium term Investigate other possible usages of the framework, e.g. sync among LFCs Support the developers of other SE products in using it Support Grid-wide deployment

Checksum support (I) BeSTMan CASTOR MD5, ADLER32, CRC32 Supports user-provided checksum Internal checksum calculations on-demand srmLs expensive if it involves checksum calculation CASTOR Checksum recalculated by RFIO, GFTP and XROOT at each full file read/write Cannot be calculated on demand Stored for both disk and tape copies of files Checksum is checked against stored tape value when file is recalled

Checksum support (II) dCache StoRM ADLER32 (and MD5) Can be user-provided Calculated after writing a file Lazy checks on all files (from 1.9.13) StoRM Checksum calculated either on the fly by GridFTP or on demand by the Checksummer service Checksum returned when present as file attribute

Conclusions All Tier-1 sites are well aligned to the most recent versions of the storage systems All dCache sites plan to go for the new golden release by beginning of next year at the latest Disk-only solutions are being deployed in parallel to traditional HSM systems The main WLCG storage systems are still very actively developed Stronger support for common standards: NFS 4.1, HTTP support, WebDAV, etc. Increasing performance and scalability for both disk and tape layers

Acknowledgements Thanks to the Tier-1 contacts and to the storage system developers for providing the necessary input