An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Distributed Xrootd Derek Weitzel & Brian Bockelman.
Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.
EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
CASTOR Upgrade, Testing and Issues Shaun de Witt GRIDPP August 2010.
DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.
Performance Testing of DDN WOS Boxes Shaun de Witt, Roger Downing Future of Big Data Workshop June 27 th 2013.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
StoRM Some basics and a comparison with DPM Wahid Bhimji University of Edinburgh GridPP Storage Workshop 31-Mar-101Wahid Bhimji – StoRM.
Your university or experiment logo here NextGen Storage Shaun de Witt (STFC) With Contributions from: James Adams, Rob Appleyard, Ian Collier, Brian Davies,
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Your university or experiment logo here Storage and Data Management - Background Jens Jensen, STFC.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Your university or experiment logo here The Protocol Zoo A Site Presepective Shaun de Witt, STFC (RAL)
Evaluating distributed EOS installation in Russian Academic Cloud for LHC experiments A.Kiryanov 1, A.Klimentov 2, A.Zarochentsev 3. 1.Petersburg Nuclear.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
Future Plans at RAL Tier 1 Shaun de Witt. Introduction Current Set-Up Short term plans Final Configuration How we get there… How we plan/hope/pray to.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
L. Betev, D. Cameron, S. Campana, P. Charpentier, D. Duelmann, A. Filipcic, A. Di Girolamo, N. Magini, C. Wissing Summary of the Experiments Data Management.
Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Storage Interfaces and Access: Interim report Wahid Bhimji University of Edinburgh On behalf of WG: Brian Bockelman, Philippe Charpentier, Simone Campana,
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
First Experiences with Ceph on the WLCG Grid Rob Appleyard Shaun de Witt, James Adams, Brian Davies.
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
CMS data access Artem Trunov. CMS site roles Tier0 –Initial reconstruction –Archive RAW + REC from first reconstruction –Analysis, detector studies, etc.
Federating Data in the ALICE Experiment
a brief summary for users
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
Data Management Summary of Experiment Prospective
Ian Bird, CERN & WLCG CNAF, 19th November 2015
Global Data Access – View from the Tier 2
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
Managing Storage in a (large) Grid data center
Future of WAN Access in ATLAS
FileStager test results
Experiences with http/WebDAV protocols for data access in high throughput computing
STORM & GPFS on Tier-2 Milan
GFAL 2.0 Devresse Adrien CERN lcgutil team
Storage elements discovery
Brookhaven National Laboratory Storage service Group Hironori Ito
Grid Canada Testbed using HEP applications
Presentation transcript:

An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)

Outline Motivation and Background Experimental Setup Non-Optimised Results Effects of Experiment Optimisation Conclusion

Background Changes to Experiment Model Standardised Protocols Federated Access –ATLAS FAX –CMS AAA Relevance to Other Fields

Model Change Tier 0 Tier 1 Tier 2 The Old Model The New Model

… Sort of the opposite to UK Railways!

Protocol Standardisation GridFTP rfio dcap file rfiov2 http WebDAV S3 XRootD

Protocol Standardisation Old Model – The Protocol Zoo –GridFTP was the ONLY common protocol Used for WAN transfers between sites –Locally a variety of protocols were used DPM – rfio Lustre/GPFS – file dCache – dcap CASTOR – rfio –Incompatible with DPM version EXCEPT ALICE –Only used XRootD

The New Protocol Model GridFTP still used for most planned wide area transfers –Many sites also support WebDAV for this All sites also now support XRootD In most cases legacy protocols are no longer used

XRootD – an aside What is it? –A protocol and an infrastructure –Infrastructure looks a bit like DNS Why use it? –Fault tolerant allowing fallback –Scalable (just look at DNS) –High Performance

Federated Access Two ‘flavours’ currently used in WLCG –AAA (Any Time, Any Place, Anywhere) –FAX (Federated Access through XrootD) What does it mean? –Jobs can be run anywhere –XRootD allows for ‘discovery’ of file location –If a file does not exist locally it can be directly accessed from a remote site

And this is what it looks like… Client redirector Site

Uses in Other Scientific Fields Potential to significantly network usage –Climatology/Meteorology Data stored in large 3-D objects, but analysis often done on a subset –Seismology Waveform data stored in daily files for every station, but most analysis has smaller temporal and geographic range

Experimental Setup

What’s The Question Is it better to directly read from archive storage, or copy to local disk and read from there? –Performance –Effect of read sizes and seeks –Load (no graphs)

Really Simple… Remote Storage Attached Storage CLIENT Client Machine (VM): 4GB RAM 2 cores SCSI attached Storage Test Sites: STFC (CASTOR) Lancaster (DPM) CERN (EOS) ASGC (DPM) 50x1GB Test Files Direct Read Copy Read

The Client Simple Client –NO PROCESSING/WRITING –Either Open a file remotely or Copy a file to local disk and open it –Read the file Sequentially In blocks with seeks between

Reading From Local Storage Legacy Protocol

Reading From Local Storage

Read Size Effects

Reading From Close Storage Data obtained from Tier 2 Site at Lancaster University –Distance: 350km –Mean RTT: ~26ms –With thanks to Matt Droidge for allowing use of storage for testing

Reading From Lancaster Reading 10% of the file in 1kB chunks took 1952 seconds!

Reading From Remote Storage Data obtained from Tier 1 Site at ASGC –Distance: 10,300 km –Mean RTT: ~300ms –With thanks to Felix Lee for allowing use of storage for testing

Reading from ASGC

Optimisations Comparative tests run accessing files using ROOT data analysis framework –Heavily optimised for use with xrootd – –Run on production worker node Results are impressive –Reading 2GB data files containing 5.5k events took < 2 seconds –Lots of caching helps!

Finally – STFCs New System CEPH (Giant) –NOT optimised –Xroot server written by S. Ponce (CERN) –Uses RADOS-Stiper to allow arbitrary files –Configured to use erasure encoding with cache tier 16+2

Reading from CEPH at RAL

Summary The question – Is it better to read file directly from a storage system or should you copy it to local disk? Answer – It depends! –Distance (RTT) –Protocol –How you want to read it –The format of the file There is no ‘right’ answer 