Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations / s with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier, Fabrizio Furano, Vincent Garonne, Andrew Hanushevsky, Oliver Keeble. Sam Skipsey …
Introduction Already discussed some themes in Copenhagen WLCG wkshpCopenhagen WLCG wkshp Improve efficiency; flexibility; simplicity. Interoperation with wider ‘big-data’ world. Try to cover slightly different ground here, under similar areas: WLCG technologies: activities since then. ‘WiderWorld’ technologies. Caveats: Not discussing networking. Accepting some things as ‘done’ (on-track) (e.g. FTS3, commissioning of xrootd federation; LFC migration). Told to ‘stimulate discussion’: This time discussion -> action: lets agree some things ;-).
Outline WLCG activities Data federations/remote access Operating at Scale. Storage Interfaces SRM, WebDav and Xrootd. Benchmarking and I/O Wider World Storage hardware technology Storage systems, Databases ‘Data Science’ Discussion items
The LHC world
Storage Interfaces: SRM All WLCG experiments will allow non-SRM disk-only resources by or during Run 2. CMS already claim this – (and ALICE don’t use..) ATLAS validating in coming months (after Rucio migration) use of WebDav for deletion (proto-service exists); FTS3 non-SRM transfers; and alternative namespace-based space reporting. LHCb “testing the possibility to bypass SRM for most of the usages except tape-staging. … more work than anticipated... But for run2, hopefully this will be all solved and tested.” Must offer as stable /reliable a service with alternative used. Also some sites have desire for VO reservation / quota such as provided by SRM spacetokens which should be covered by alternative (but doesn’t need to be user definable like SRM).
Xrootd data federations Xrootd-based data federation in production All LHC experiments using a fallback to remote access Need to incorporate last sites … Being tested at scale ATLAS Failover usage (12 weeks) example (R.Gardner) : See pre-GDB data accesspre-GDB data access And SLAC federation workshopSLAC federation workshop
Xrootd data federations Monitoring highly developed. But not quite 100% coverage and could be more used… A. Beche – pre GDB
Remote read and data federations at scale Not all network links are perfect. Storage servers require tuning. Eg. Alice experiences from pre-GDBexperiences from pre-GDB
Remote read at scale Sharing between hungry VOs could be a challenge. Analysis jobs vary: CMS quote WW hammercloud benchmark needs 20 MB/s to be 100% cpu eff. Sites can use their own network infrastructure to protect. Vos shouldn’t try and mirco-manage but strong desire for storage plugins (e.g. xrootd throttling plugin) E.g. ATLAS H->WW being throttled by 1Gig NAT – corresponding decrease in event rate
HTTP / WebDav As do DPM, dCache, StoRM So will be universally available. Monitoring – much available (e.g. in Apache) but not currently in WLCG. Fabrizio Furano : pre-GDB: XrdHTTP is done (in Xrootd4) – offers potential for xrootd sites to have http interface.
Http/WebDav: Experiments CMS no current plans. LHCb will use if best protocol at site. Sylvain Blunier: ATLAS plan use of WebDav for: User put/get. Deletion instead of SRM FTS or job read if best performing Find deployment (despite being used for Rucio rename) not stably at 100%
Benchmarking and I/O Continuing activity to understand (distributed) I/O See ROOT IO WorkshopROOT IO Workshop Important developments in ROOT I/O, e.g.: Thread-safety (or “thread-usability”) TTreeCache configurable with environment variable Cross protocol redirection. ROOT 6 (cling/ C+11) increases possibilities E.g. M. Tadel – Federated Storage WkshpM. Tadel – Federated Storage Wkshp
The rest of the world
Underlying Storage Technology Technologies in use for Run 2 already here or in development. Magnetic disk: current increases in capacity (to 6T) using current technology, further potential for capacity (shingles, HAMR) but performance not in line Existing Flash SSDs and hybrids NVRAM improvements (now really really soon now …(?) …) Would be expensive for WLCG use (though not compared to RAM) Memristor Phase change memory 14
Storage Systems ‘Cloud’ (non-POSIX) scalable solutions Algorithmic data placement. RAIN fault tolerance becoming common / standard. “Software defined storage” E.g Ceph, HDFS + RAIN, Vipr WLCG sites interested in using such technologies and we should be flexible enough to use it.
Protocols, Databases Http -> SPDY -> Http2 Session reuse Smaller headers NoSQL -> NewSQL Horizontally-scalable Main memory xrootd protocol LSSY qserv dattabase (D. Boutigny OSG Meeting Apr 2014.) OSG Meeting Apr 2014
Data science Explosion in industry interest. Outside expertise in data science could help even the most confident science discipline (ATLAS analysis is < 400 th on leader board now) 17
Discussion
Relaxing requirements … For example, having an appropriate level of protection for data readability Removing technical read protection would not change practical protection as currently non-VO site admins can read it; and no-one can interpret our data. Storage developers should first demonstrate the gain (performance or simplification) and we could push this. Similarly for other barriers towards, for example object- store-like scaling and integration of non-HEP resources…
Summary and discussion/action points Flexible/remote access: remaining sites need to deploy xrootd (and http for atlas). Use at scale will need greater use of monitoring, tuning and tools for protecting resources. Protocol zoo: experiments must commit to reduce in Run 2 (e.g. in ‘return’ for dav / xrootd remove rfio, srm… ) Wider world: ‘data science’, databases, storage technologies. Convene (and attend) more outside-WLCG workshops to share. Scalable resources: We should aim to be able to incorporate a disk site that has no WLCG specific services / interfaces BDII, Accounting, X509, perfsonar, SRM, ‘package reporter’