An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
Outline Motivation and Background Experimental Setup Non-Optimised Results Effects of Experiment Optimisation Conclusion
Background Changes to Experiment Model Standardised Protocols Federated Access –ATLAS FAX –CMS AAA Relevance to Other Fields
Model Change Tier 0 Tier 1 Tier 2 The Old Model The New Model
… Sort of the opposite to UK Railways!
Protocol Standardisation GridFTP rfio dcap file rfiov2 http WebDAV S3 XRootD
Protocol Standardisation Old Model – The Protocol Zoo –GridFTP was the ONLY common protocol Used for WAN transfers between sites –Locally a variety of protocols were used DPM – rfio Lustre/GPFS – file dCache – dcap CASTOR – rfio –Incompatible with DPM version EXCEPT ALICE –Only used XRootD
The New Protocol Model GridFTP still used for most planned wide area transfers –Many sites also support WebDAV for this All sites also now support XRootD In most cases legacy protocols are no longer used
XRootD – an aside What is it? –A protocol and an infrastructure –Infrastructure looks a bit like DNS Why use it? –Fault tolerant allowing fallback –Scalable (just look at DNS) –High Performance
Federated Access Two ‘flavours’ currently used in WLCG –AAA (Any Time, Any Place, Anywhere) –FAX (Federated Access through XrootD) What does it mean? –Jobs can be run anywhere –XRootD allows for ‘discovery’ of file location –If a file does not exist locally it can be directly accessed from a remote site
And this is what it looks like… Client redirector Site
Uses in Other Scientific Fields Potential to significantly network usage –Climatology/Meteorology Data stored in large 3-D objects, but analysis often done on a subset –Seismology Waveform data stored in daily files for every station, but most analysis has smaller temporal and geographic range
Experimental Setup
What’s The Question Is it better to directly read from archive storage, or copy to local disk and read from there? –Performance –Effect of read sizes and seeks –Load (no graphs)
Really Simple… Remote Storage Attached Storage CLIENT Client Machine (VM): 4GB RAM 2 cores SCSI attached Storage Test Sites: STFC (CASTOR) Lancaster (DPM) CERN (EOS) ASGC (DPM) 50x1GB Test Files Direct Read Copy Read
The Client Simple Client –NO PROCESSING/WRITING –Either Open a file remotely or Copy a file to local disk and open it –Read the file Sequentially In blocks with seeks between
Reading From Local Storage Legacy Protocol
Reading From Local Storage
Read Size Effects
Reading From Close Storage Data obtained from Tier 2 Site at Lancaster University –Distance: 350km –Mean RTT: ~26ms –With thanks to Matt Droidge for allowing use of storage for testing
Reading From Lancaster Reading 10% of the file in 1kB chunks took 1952 seconds!
Reading From Remote Storage Data obtained from Tier 1 Site at ASGC –Distance: 10,300 km –Mean RTT: ~300ms –With thanks to Felix Lee for allowing use of storage for testing
Reading from ASGC
Optimisations Comparative tests run accessing files using ROOT data analysis framework –Heavily optimised for use with xrootd – –Run on production worker node Results are impressive –Reading 2GB data files containing 5.5k events took < 2 seconds –Lots of caching helps!
Finally – STFCs New System CEPH (Giant) –NOT optimised –Xroot server written by S. Ponce (CERN) –Uses RADOS-Stiper to allow arbitrary files –Configured to use erasure encoding with cache tier 16+2
Reading from CEPH at RAL
Summary The question – Is it better to read file directly from a storage system or should you copy it to local disk? Answer – It depends! –Distance (RTT) –Protocol –How you want to read it –The format of the file There is no ‘right’ answer