Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.

Slides:



Advertisements
Similar presentations
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Advertisements

Storage Workshop Summary Wahid Bhimji University Of Edinburgh On behalf all of the participants…
Wahid Bhimji SRM; FTS3; xrootd; DPM collaborations; cluster filesystems.
Tom Byrne, 12 th November 2014 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne 1.
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
Wahid Bhimji Andy Washbrook And others including ECDF systems team Not a comprehensive update but what ever occurred to me yesterday.
Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.
DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
Wahid Bhimji University of Edinburgh P. Clark, M. Doidge, M. P. Hellmich, S. Skipsey and I. Vukotic 1.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
INFSO-RI Enabling Grids for E-sciencE SRMv2.2 experience Sophie Lemaitre WLCG Workshop.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
StoRM Some basics and a comparison with DPM Wahid Bhimji University of Edinburgh GridPP Storage Workshop 31-Mar-101Wahid Bhimji – StoRM.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP35, Liverpool 11 Sep 2015.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX meeting intro and news Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Federated Xrootd.
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
Your university or experiment logo here Storage and Data Management - Background Jens Jensen, STFC.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Efi.uchicago.edu ci.uchicago.edu Status of the FAX federation Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 /
MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons Maarten Litmaath On behalf of the WG participants GDB 09/09/2015.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) HEPIX, BNL 13 Oct 2015.
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Efi.uchicago.edu ci.uchicago.edu Federating ATLAS storage using XrootD (FAX) Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
DPM at ATLAS sites and testbeds in Italy
GridPP DIRAC Daniela Bauer & Simon Fayer.
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
GFAL 2.0 Devresse Adrien CERN lcgutil team
DPM releases and platforms status
Presentation transcript:

Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining

DPM DPM collaboration is alive – had our first joint dev mtg. GridPP have some tasks on a task list. This is mainly for me and Sam but others are welcome to contribute (and we may need it…) Support (beyond local) - e.g. a rota for dpm-user-forum Admin-tools ; DMLITE/python DMLITE/shell (mainly Sam) DMLite/Profiler / logging (“interest” – again Sam mainly) Documentation and web presence Configuration / puppet : Possibly best to wait for release – real soon now – but then input from sites who actually use puppet would be valuable (only test setup at Ed.) Permanent pilot testing (other sites welcome if they have test machines) HTTP performance / S3 validation (I have run some tests -need to do more).

Xrootd – Atlas Federation (FAX) Lots of UK sites – I encourage the last few to join. Still in a testing phase for ATLAS But just enabled fallback to FAX (if no file found on 3 rd attempt) at ECDF – still no way to test properly

Xrootd for local grid job access on DPM sites Glasgow, Oxford, Edinburgh been using xrdcp instead of rfcp for Atlas jobs for some time now No problems – I encourage other sites to switch Glasgow and Oxford have been using direct xrd access (not copy) for atlas analysis jobs for a few weeks No problems – I think we should try at more sites. For ECDF changing that exposed an issue in tarball As it is a python26 one – hoping it goes away in SL6.

Xrootd for local local access from “Tier 3” Being used at (at least) Edinburgh and Liverpool Works well -I recommend it – just with Root installed can do TFile::Open(“root://srm.glite.ecdf.ed.ac.uk//dpm…”) They must use TTreeCache in their code for decent performance.TTreeCache To get a list of files for ATLAS for example (maybe easier way but this works…) dq2-ls -f -p -L UKI-SCOTGRID-ECDF_LOCALGROUPDISK mc12_8TeV PowhegPythia8_AU2CT10_ggH125_ZZ4lep_noTau.merge.NTUP_HSG2.e1622_s1581_s1586_r3658_ r3549_p1344/ | grep srm | sed 's/srm:\/\/srm.glite.ecdf.ed.ac.uk/root:\/\/srm.glite.ecdf.ed.ac.uk\//g’ Can also get global names (which would use FAX so get the files remotely if not available locally – not sure I would expose users to that yet… export STORAGEPREFIX=root://YOURLOCALSRM://; dq2-list-files –p DATASET

Xrootd access of T2 from T3 continued… Can provide a “POSIX” interface with gFal2 gFalFS:gFalFS yum install gfal2-all gfalFS gfal2-plugin-xrootd Xrootd plugin from epel-testing currently (though maybe in epel since release yesterday…) usermod –a –G fuse gfalFS ~/mount_tmp root://srm.glite.ecdf.ed.ac.uk/dpm/ecdf.ed.ac.uk/home/atlas Can ls etc. Test of reading files for single jobs. (100% of 7929 events from 533 MB file ) John and I have fed back a few bugs – but it basically works. FuseXrootd directly TTreeCache on (ReadCalls = 17)39.47, , TTreeCache off (ReadCalls = )16.11, , 35.57

Xrootd server data mining… Source: see, for example, this GDB talk. Comes from UDP stream from every xrootd server at every site.GDB talk Monitoring ATLAS FAX traffic, and also local traffic if xrootd is used (e.g. ECDF, Glasgow and CERN) Lots of plots obtainable from dashboarddashboard Here we do some analysis on the raw datadata Tests are a significant part of current FAX data So “spikes” can be these rather than user activity. Can provide an extra angle on tests. Non-EOS non-Fax traffic has a lot of xrdcp – though some DPM sites recently moving to direct access for Analy queues

Throughput : Edinburgh (DPM) : (Direct and copy traffic) “xrdcp” are those where read exactly equals filesize – probably copies

Throughput: QMUL (Lustre and Storm): Local and Remote (xrdcp)

All sites except CERN

ReadRatio < 1 Reading small fractions not a bad thing. Reading 10% events with TTreeCache (e.g. Ilija’s tests) gives a read ratio near 100%: File Size / MB

ECDF Jobs with up to a million read operations

Most jobs have 0 vector operations (so not using TTreeCache) Some jobs do use it (but these mostly tests (shaded blue)) ECDF

Conclusions Performance data from xrootd is useful – we can use this to probe sites or application. Many sites currently configured for local access rather than remote. Throughput numbers appear low (may not be a problem). Current jobs are doing OK. Large no. of jobs with low read ratios suggest ATLAS doing more direct access would conserve bandwidth Jobs which read files multiple times, have high no. of read operations (and low bandwidth) are concerning: If ATLAS can force bulk of traffic to well behaved applications: (e.g central service or app or wrapper ) then we will be better off. Encourage DPM sites to switch to xrootd for local access – copy for production and direct access for analysis … and to join atlas FAX. Volunteers for DPM task list items are welcome.