DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere.

Slides:



Advertisements
Similar presentations
Wahid Bhimji SRM; FTS3; xrootd; DPM collaborations; cluster filesystems.
Advertisements

EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
Wahid Bhimji University of Edinburgh P. Clark, M. Doidge, M. P. Hellmich, S. Skipsey and I. Vukotic 1.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Release Process Maria Alandes Pradillo.
StoRM Some basics and a comparison with DPM Wahid Bhimji University of Edinburgh GridPP Storage Workshop 31-Mar-101Wahid Bhimji – StoRM.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
Configuration Management with Cobbler and Puppet Kashif Mohammad University of Oxford.
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX meeting intro and news Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Federated Xrootd.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
CERN IT Department CH-1211 Geneva 23 Switzerland GT WG on Storage Federations First introduction Fabrizio Furano
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Information System Status and Evolution Maria Alandes Pradillo, CERN CERN IT Department, Grid Technology Group GDB 13 th June 2012.
Efi.uchicago.edu ci.uchicago.edu Status of the FAX federation Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 /
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.
Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
WLCG Operations Coordination Andrea Sciabà IT/SDC 10 th July 2013.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
EU privacy issue Ilija Vukotic 6 th October 2014.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) HEPIX, BNL 13 Oct 2015.
Any Data, Anytime, Anywhere Dan Bradley representing the AAA Team At OSG All Hands Meeting March 2013, Indianapolis.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
DPM: Future Proof Storage Ricardo Rocha ( on behalf of the DPM team ) EMI INFSO-RI
Data Distribution Performance Hironori Ito Brookhaven National Laboratory.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Efi.uchicago.edu ci.uchicago.edu Federating ATLAS storage using XrootD (FAX) Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
Global Data Access – View from the Tier 2
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
DPM releases and platforms status
Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.
Presentation transcript:

DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere

Outline Introductory: What is FAX and the goals (as stated by the project) Some personal perspectives DPM FAX deployment status Testing / Monitoring / Use-Cases Concerns and Benefits

What is FAX? Description (from the FAX Twiki): The Federated ATLAS Xrootd (FAX) system is a storage federation aims at bringing Tier1, Tier2 and Tier3 storage together as if it is a giant single storage system, so that users do not have to think of there is the data and how to access the data. A client software like ROOT or xrdcp will interact with FAX behind the sight and will reach the data whereever it is in the federation. Goals (from Rob Gardner’s talk at Lyon mtg. 2012):Rob Gardner’s talk Common ATLAS namespace across all storage sites, accessible from anywhere; Easy to use, homogeneous access to data Use as failover for existing systems Gain access to more CPUs using WAN direct read access Use as caching mechanism at sites to reduce local data management tasks

Other details / oddities of FAX (some of this is my perspective) Started in US with pure-xrootd and xrootd-dcache Now worldwide inc. UK; IT ; DE and CERN (EOS) Uses topology of “regional” redirectors (see next slide) ATLAS federation uses a “Name2Name” LFC lookup Now moving from R&D to production But not (quite) there yet IMHO There is interest in http(s) federation instead / as well But this is nowhere near as far along.

Regional redirectors

DPM Fax Deployment Status Last workshop DPM developers presented a new dpm-xrootd: detailsdetails Initially deployed manually in Scotgrid. A few teething configuration issues, all tweaks now in YAIM (since 1.8.4) and documented at Thanks to David Smith Regional redirectors setup for UK; IT; DE (and EU) Sites working now (see next page): UK: Edinburgh (ECDF); Glasgow; Oxford; DE: Prague; IT: Roma (at least) EMI push means many other sites will be able to install soon IT plan to expand to many sites (Frascati, Napoli++); ASGC (T1) plan to deploy Some Issues.See later …

Traffic monitoring xrootd.monitor all rbuff 32k auth flush 30s window 5s dest files info user io redir atl- prod05.slac.stanford.edu:9930atl- prod05.slac.stanford.edu:9930 xrd.report atl-prod05.slac.stanford.edu:9931 every 60s all -buff -poll sync Initial bug OK since updated to (in EMI external repo) See: prod07.slac.stanford.edu:8080/display

Functional Testing Regular WAN testing in hammercloud and MAP: (See next page) (The map) Traffic monitoring: (previous page) Dashboard monitoring (being developed) transfers.cern.ch/ui/#m.content=(efficiency,errors,successes)&tab=matrix Basic service test (next page) There will be a SAM test

Use Cases – revisiting goals Common ATLAS namespace across all storage sites, accessible from anywhere; Easy to use, homogeneous access to data Done – implicit in the setup Keen users being encouraged to try: tutorials etc. Use as failover for existing systems Production jobs can now retry from the federation if all local tries fail… works – but not tried on DPM in anger. Gain access to more CPUs using WAN direct read access WAN access works – no reason no to use in principle. Timing info from WAN tests ready for brokering – not yet used (AFAIK) Use as caching mechanism at sites to reduce local data management tasks Nothing yet has been done on this with DPM (AFAIK).

Stress Testing DPM Xrootd ANALY_GLASGOW_XROOTD queue Stress-tested “local” xrootd access For direct access we saw some server load (same as we do for rfio). David did offer to help – we didn’t follow up much Trying panda failover tricks Not done yet. Requiring new dq2 tools in cvmfs which requires new python. ASGC have done extensive hammerclouds on (non-FAX) dpm-xrootd : Promising results. Using in production now (?)

FAX-DPM Issues encountered PAST : xrootd packaging: would ideally be current in epel – but there have been some problems achieving that Now in EMI externals which is Okish Without rbuff 32k in monitoring crashed with initial version Fixed in later xrootd versions Getting stuck in LFC lookup: LFC host is an alias and single threaded N2N sometimes trying the “wrong” host Fixed by setenv LFC_CONRETRY=0 PRESENT: SL6: xrootd segfaults on startup so until available can’t use those sites NONE OF THESE ARE DPM PROBLEMS – IN EACH CASE DAVID FOUND THE FIX. HOWEVER MAY INDICATE A (PRATICAL) ISSUE WITH USING XROOTD ON DIFFERENT STORAGE SYSTEMS.

My (general) concerns Even now if users started using this, this could result in a lot of unexpected traffic of files served from DPM sites : The service is not in production – no SAM test; no clear expectations of service etc. Communication with sites currently direct to site admin (not via cloud or ggus). Some network paths are slow. Ideally should be able to configure server (or redirector?) to limit connections / bandwidth. (And to monitor monitoring). Multiple VO support: currently separate server instances - sensible? (xrootd) Software Documentation. http (s) might be preferable standard But many site failures are storage related so if it solves those then its worth it

Conclusion Significant development in DPM/FAX integration since last meeting: Basically from nothing to something… At least 5 T2 sites federated and seemingly working… But need to stress test under real use And to have some concerns assuaged