Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations /emails with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier,

Similar presentations


Presentation on theme: "Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations /emails with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier,"— Presentation transcript:

1 Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations /emails with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier, Fabrizio Furano, Vincent Garonne, Andrew Hanushevsky, Oliver Keeble. Sam Skipsey …

2 Introduction  Already discussed some themes in Copenhagen WLCG wkshpCopenhagen WLCG wkshp  Improve efficiency; flexibility; simplicity.  Interoperation with wider ‘big-data’ world.  Try to cover slightly different ground here, under similar areas:  WLCG technologies: activities since then.  ‘WiderWorld’ technologies.  Caveats:  Not discussing networking.  Accepting some things as ‘done’ (on-track) (e.g. FTS3, commissioning of xrootd federation; LFC migration).  Told to ‘stimulate discussion’:  This time discussion -> action: lets agree some things ;-).

3 Outline  WLCG activities  Data federations/remote access  Operating at Scale.  Storage Interfaces  SRM, WebDav and Xrootd.  Benchmarking and I/O  Wider World  Storage hardware technology  Storage systems, Databases  ‘Data Science’  Discussion items

4 The LHC world

5 Storage Interfaces: SRM  All WLCG experiments will allow non-SRM disk-only resources by or during Run 2.  CMS already claim this – (and ALICE don’t use..)  ATLAS validating in coming months (after Rucio migration) use of WebDav for deletion (proto-service exists); FTS3 non-SRM transfers; and alternative namespace-based space reporting.  LHCb “testing the possibility to bypass SRM for most of the usages except tape-staging. … more work than anticipated... But for run2, hopefully this will be all solved and tested.”  Must offer as stable /reliable a service with alternative used.  Also some sites have desire for VO reservation / quota such as provided by SRM spacetokens which should be covered by alternative (but doesn’t need to be user definable like SRM).

6 Xrootd data federations  Xrootd-based data federation in production  All LHC experiments using a fallback to remote access  Need to incorporate last sites …  Being tested at scale ATLAS Failover usage (12 weeks) example (R.Gardner) : See pre-GDB data accesspre-GDB data access And SLAC federation workshopSLAC federation workshop

7 Xrootd data federations  Monitoring highly developed. But not quite 100% coverage and could be more used… A. Beche – pre GDB

8 Remote read and data federations at scale  Not all network links are perfect. Storage servers require tuning. Eg. Alice experiences from pre-GDBexperiences from pre-GDB

9 Remote read at scale  Sharing between hungry VOs could be a challenge. Analysis jobs vary: CMS quote WW hammercloud benchmark needs 20 MB/s to be 100% cpu eff.  Sites can use their own network infrastructure to protect. Vos shouldn’t try and mirco-manage but strong desire for storage plugins (e.g. xrootd throttling plugin) E.g. ATLAS H->WW being throttled by 1Gig NAT – corresponding decrease in event rate

10 HTTP / WebDav  As do DPM, dCache, StoRM  So will be universally available.  Monitoring – much available (e.g. in Apache) but not currently in WLCG. Fabrizio Furano : pre-GDB:  XrdHTTP is done (in Xrootd4) – offers potential for xrootd sites to have http interface.

11 Http/WebDav: Experiments  CMS no current plans. LHCb will use if best protocol at site. Sylvain Blunier:  ATLAS plan use of WebDav for:  User put/get.  Deletion instead of SRM  FTS or job read if best performing  Find deployment (despite being used for Rucio rename) not stably at 100%

12 Benchmarking and I/O  Continuing activity to understand (distributed) I/O See ROOT IO WorkshopROOT IO Workshop  Important developments in ROOT I/O, e.g.:  Thread-safety (or “thread-usability”)  TTreeCache configurable with environment variable  Cross protocol redirection.  ROOT 6 (cling/ C+11) increases possibilities E.g. M. Tadel – Federated Storage WkshpM. Tadel – Federated Storage Wkshp

13 The rest of the world

14 Underlying Storage Technology  Technologies in use for Run 2 already here or in development.  Magnetic disk: current increases in capacity (to 6T) using current technology, further potential for capacity (shingles, HAMR) but performance not in line  Existing Flash SSDs and hybrids  NVRAM improvements (now really really soon now …(?) …)  Would be expensive for WLCG use (though not compared to RAM) Memristor Phase change memory 14

15 Storage Systems  ‘Cloud’ (non-POSIX) scalable solutions  Algorithmic data placement.  RAIN fault tolerance becoming common / standard.  “Software defined storage”  E.g Ceph, HDFS + RAIN, Vipr  WLCG sites interested in using such technologies and we should be flexible enough to use it.

16 Protocols, Databases  Http -> SPDY -> Http2  Session reuse  Smaller headers  NoSQL -> NewSQL  Horizontally-scalable  Main memory xrootd protocol LSSY qserv dattabase (D. Boutigny OSG Meeting Apr 2014.) OSG Meeting Apr 2014

17 Data science  Explosion in industry interest.  Outside expertise in data science could help even the most confident science discipline (ATLAS analysis is < 400 th on leader board now) 17

18 Discussion

19 Relaxing requirements …  For example, having an appropriate level of protection for data readability  Removing technical read protection would not change practical protection as currently non-VO site admins can read it; and no-one can interpret our data.  Storage developers should first demonstrate the gain (performance or simplification) and we could push this.  Similarly for other barriers towards, for example object- store-like scaling and integration of non-HEP resources…

20 Summary and discussion/action points  Flexible/remote access: remaining sites need to deploy xrootd (and http for atlas). Use at scale will need greater use of monitoring, tuning and tools for protecting resources.  Protocol zoo: experiments must commit to reduce in Run 2 (e.g. in ‘return’ for dav / xrootd remove rfio, srm… )  Wider world: ‘data science’, databases, storage technologies. Convene (and attend) more outside-WLCG workshops to share.  Scalable resources: We should aim to be able to incorporate a disk site that has no WLCG specific services / interfaces  BDII, Accounting, X509, perfsonar, SRM, ‘package reporter’


Download ppt "Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations /emails with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier,"

Similar presentations


Ads by Google