Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)

Similar presentations


Presentation on theme: "An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)"— Presentation transcript:

1 An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)

2 Outline Motivation and Background Experimental Setup Non-Optimised Results Effects of Experiment Optimisation Conclusion

3 Background Changes to Experiment Model Standardised Protocols Federated Access –ATLAS FAX –CMS AAA Relevance to Other Fields

4 Model Change Tier 0 Tier 1 Tier 2 The Old Model The New Model

5 … Sort of the opposite to UK Railways!

6 Protocol Standardisation GridFTP rfio dcap file rfiov2 http WebDAV S3 XRootD

7 Protocol Standardisation Old Model – The Protocol Zoo –GridFTP was the ONLY common protocol Used for WAN transfers between sites –Locally a variety of protocols were used DPM – rfio Lustre/GPFS – file dCache – dcap CASTOR – rfio –Incompatible with DPM version EXCEPT ALICE –Only used XRootD

8 The New Protocol Model GridFTP still used for most planned wide area transfers –Many sites also support WebDAV for this All sites also now support XRootD In most cases legacy protocols are no longer used

9 XRootD – an aside What is it? –A protocol and an infrastructure –Infrastructure looks a bit like DNS Why use it? –Fault tolerant allowing fallback –Scalable (just look at DNS) –High Performance

10 Federated Access Two ‘flavours’ currently used in WLCG –AAA (Any Time, Any Place, Anywhere) –FAX (Federated Access through XrootD) What does it mean? –Jobs can be run anywhere –XRootD allows for ‘discovery’ of file location –If a file does not exist locally it can be directly accessed from a remote site

11 And this is what it looks like… Client redirector Site

12 Uses in Other Scientific Fields Potential to significantly network usage –Climatology/Meteorology Data stored in large 3-D objects, but analysis often done on a subset –Seismology Waveform data stored in daily files for every station, but most analysis has smaller temporal and geographic range

13 Experimental Setup

14 What’s The Question Is it better to directly read from archive storage, or copy to local disk and read from there? –Performance –Effect of read sizes and seeks –Load (no graphs)

15 Really Simple… Remote Storage Attached Storage CLIENT Client Machine (VM): 4GB RAM 2 cores SCSI attached Storage Test Sites: STFC (CASTOR) Lancaster (DPM) CERN (EOS) ASGC (DPM) 50x1GB Test Files Direct Read Copy Read

16 The Client Simple Client –NO PROCESSING/WRITING –Either Open a file remotely or Copy a file to local disk and open it –Read the file Sequentially In blocks with seeks between

17

18 Reading From Local Storage Legacy Protocol

19 Reading From Local Storage

20 Read Size Effects

21 Reading From Close Storage Data obtained from Tier 2 Site at Lancaster University –Distance: 350km –Mean RTT: ~26ms –With thanks to Matt Droidge for allowing use of storage for testing

22 Reading From Lancaster Reading 10% of the file in 1kB chunks took 1952 seconds!

23 Reading From Remote Storage Data obtained from Tier 1 Site at ASGC –Distance: 10,300 km –Mean RTT: ~300ms –With thanks to Felix Lee for allowing use of storage for testing

24 Reading from ASGC

25 Optimisations Comparative tests run accessing files using ROOT data analysis framework –Heavily optimised for use with xrootd –https://root.cern.ch/drupal/ –Run on production worker node Results are impressive –Reading 2GB data files containing 5.5k events took < 2 seconds –Lots of caching helps!

26 Finally – STFCs New System CEPH (Giant) –NOT optimised –Xroot server written by S. Ponce (CERN) –Uses RADOS-Stiper to allow arbitrary files –Configured to use erasure encoding with cache tier 16+2

27 Reading from CEPH at RAL

28 Summary The question – Is it better to read file directly from a storage system or should you copy it to local disk? Answer – It depends! –Distance (RTT) –Protocol –How you want to read it –The format of the file There is no ‘right’ answer 


Download ppt "An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)"

Similar presentations


Ads by Google