Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations /emails with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier,

Slides:



Advertisements
Similar presentations
Storage Workshop Summary Wahid Bhimji University Of Edinburgh On behalf all of the participants…
Advertisements

Wahid Bhimji SRM; FTS3; xrootd; DPM collaborations; cluster filesystems.
Distributed Xrootd Derek Weitzel & Brian Bockelman.
OSG Technology Plans OSG Staff Meeting July 2012 Brian Bockelman.
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
Data & Storage Management TEGs Summary of recommendations Wahid Bhimji, Brian Bockelman, Daniele Bonacorsi, Dirk Duellmann GDB, CERN 18 th April 2012.
Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.
Wahid Bhimji University of Edinburgh P. Clark, M. Doidge, M. P. Hellmich, S. Skipsey and I. Vukotic 1.
Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
Your university or experiment logo here NextGen Storage Shaun de Witt (STFC) With Contributions from: James Adams, Rob Appleyard, Ian Collier, Brian Davies,
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
Evolution, by tackling new challenges| CHEP 2015, Japan | Patrick Fuhrmann | 16 April 2015 | 1 Patrick Fuhrmann On behave of the project team Evolution,
Your university or experiment logo here Storage and Data Management - Background Jens Jensen, STFC.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Your university or experiment logo here The Protocol Zoo A Site Presepective Shaun de Witt, STFC (RAL)
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
CERN IT Department CH-1211 Geneva 23 Switzerland GT HTTP solutions for data access, transfer, federation Fabrizio Furano (presenter) on.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Storage Interfaces Introduction Wahid Bhimji University of Edinburgh Based on previous discussions with Working Group: (Brian Bockelman, Simone Campana,
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
Storage Interfaces and Access pre-GDB Wahid Bhimji University of Edinburgh On behalf of all those who participated.
Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Storage Interfaces and Access: Interim report Wahid Bhimji University of Edinburgh On behalf of WG: Brian Bockelman, Philippe Charpentier, Simone Campana,
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
Wahid Bhimji (Some slides are stolen from Markus Schulz’s presentation to WLCG MB on 19 June Apologies to those who have seen some of this before)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
XRootD Monitoring Report A.Beche D.Giordano. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
Evolution of storage and data management
WLCG IPv6 deployment strategy
WLCG Workshop 2017 [Manchester] Operations Session Summary
Dynafed, DPM and EGI DPM workshop 2016 Speaker: Fabrizio Furano
Storage Interfaces and Access: Introduction
dCache – protocol developments and plans
Update on Plan for KISTI-GSDC
Storage Protocol overview
Taming the protocol zoo
EGI UMD Storage Software Repository (Mostly former EMI Software)
Ákos Frohner EGEE'08 September 2008
LHC Data Analysis using a worldwide computing grid
Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.
Presentation transcript:

Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations / s with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier, Fabrizio Furano, Vincent Garonne, Andrew Hanushevsky, Oliver Keeble. Sam Skipsey …

Introduction  Already discussed some themes in Copenhagen WLCG wkshpCopenhagen WLCG wkshp  Improve efficiency; flexibility; simplicity.  Interoperation with wider ‘big-data’ world.  Try to cover slightly different ground here, under similar areas:  WLCG technologies: activities since then.  ‘WiderWorld’ technologies.  Caveats:  Not discussing networking.  Accepting some things as ‘done’ (on-track) (e.g. FTS3, commissioning of xrootd federation; LFC migration).  Told to ‘stimulate discussion’:  This time discussion -> action: lets agree some things ;-).

Outline  WLCG activities  Data federations/remote access  Operating at Scale.  Storage Interfaces  SRM, WebDav and Xrootd.  Benchmarking and I/O  Wider World  Storage hardware technology  Storage systems, Databases  ‘Data Science’  Discussion items

The LHC world

Storage Interfaces: SRM  All WLCG experiments will allow non-SRM disk-only resources by or during Run 2.  CMS already claim this – (and ALICE don’t use..)  ATLAS validating in coming months (after Rucio migration) use of WebDav for deletion (proto-service exists); FTS3 non-SRM transfers; and alternative namespace-based space reporting.  LHCb “testing the possibility to bypass SRM for most of the usages except tape-staging. … more work than anticipated... But for run2, hopefully this will be all solved and tested.”  Must offer as stable /reliable a service with alternative used.  Also some sites have desire for VO reservation / quota such as provided by SRM spacetokens which should be covered by alternative (but doesn’t need to be user definable like SRM).

Xrootd data federations  Xrootd-based data federation in production  All LHC experiments using a fallback to remote access  Need to incorporate last sites …  Being tested at scale ATLAS Failover usage (12 weeks) example (R.Gardner) : See pre-GDB data accesspre-GDB data access And SLAC federation workshopSLAC federation workshop

Xrootd data federations  Monitoring highly developed. But not quite 100% coverage and could be more used… A. Beche – pre GDB

Remote read and data federations at scale  Not all network links are perfect. Storage servers require tuning. Eg. Alice experiences from pre-GDBexperiences from pre-GDB

Remote read at scale  Sharing between hungry VOs could be a challenge. Analysis jobs vary: CMS quote WW hammercloud benchmark needs 20 MB/s to be 100% cpu eff.  Sites can use their own network infrastructure to protect. Vos shouldn’t try and mirco-manage but strong desire for storage plugins (e.g. xrootd throttling plugin) E.g. ATLAS H->WW being throttled by 1Gig NAT – corresponding decrease in event rate

HTTP / WebDav  As do DPM, dCache, StoRM  So will be universally available.  Monitoring – much available (e.g. in Apache) but not currently in WLCG. Fabrizio Furano : pre-GDB:  XrdHTTP is done (in Xrootd4) – offers potential for xrootd sites to have http interface.

Http/WebDav: Experiments  CMS no current plans. LHCb will use if best protocol at site. Sylvain Blunier:  ATLAS plan use of WebDav for:  User put/get.  Deletion instead of SRM  FTS or job read if best performing  Find deployment (despite being used for Rucio rename) not stably at 100%

Benchmarking and I/O  Continuing activity to understand (distributed) I/O See ROOT IO WorkshopROOT IO Workshop  Important developments in ROOT I/O, e.g.:  Thread-safety (or “thread-usability”)  TTreeCache configurable with environment variable  Cross protocol redirection.  ROOT 6 (cling/ C+11) increases possibilities E.g. M. Tadel – Federated Storage WkshpM. Tadel – Federated Storage Wkshp

The rest of the world

Underlying Storage Technology  Technologies in use for Run 2 already here or in development.  Magnetic disk: current increases in capacity (to 6T) using current technology, further potential for capacity (shingles, HAMR) but performance not in line  Existing Flash SSDs and hybrids  NVRAM improvements (now really really soon now …(?) …)  Would be expensive for WLCG use (though not compared to RAM) Memristor Phase change memory 14

Storage Systems  ‘Cloud’ (non-POSIX) scalable solutions  Algorithmic data placement.  RAIN fault tolerance becoming common / standard.  “Software defined storage”  E.g Ceph, HDFS + RAIN, Vipr  WLCG sites interested in using such technologies and we should be flexible enough to use it.

Protocols, Databases  Http -> SPDY -> Http2  Session reuse  Smaller headers  NoSQL -> NewSQL  Horizontally-scalable  Main memory xrootd protocol LSSY qserv dattabase (D. Boutigny OSG Meeting Apr 2014.) OSG Meeting Apr 2014

Data science  Explosion in industry interest.  Outside expertise in data science could help even the most confident science discipline (ATLAS analysis is < 400 th on leader board now) 17

Discussion

Relaxing requirements …  For example, having an appropriate level of protection for data readability  Removing technical read protection would not change practical protection as currently non-VO site admins can read it; and no-one can interpret our data.  Storage developers should first demonstrate the gain (performance or simplification) and we could push this.  Similarly for other barriers towards, for example object- store-like scaling and integration of non-HEP resources…

Summary and discussion/action points  Flexible/remote access: remaining sites need to deploy xrootd (and http for atlas). Use at scale will need greater use of monitoring, tuning and tools for protecting resources.  Protocol zoo: experiments must commit to reduce in Run 2 (e.g. in ‘return’ for dav / xrootd remove rfio, srm… )  Wider world: ‘data science’, databases, storage technologies. Convene (and attend) more outside-WLCG workshops to share.  Scalable resources: We should aim to be able to incorporate a disk site that has no WLCG specific services / interfaces  BDII, Accounting, X509, perfsonar, SRM, ‘package reporter’