Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.

Slides:



Advertisements
Similar presentations
FAX status. Overview Status of endpoints and redirectors Monitoring Failover Overflow.
Advertisements

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Efi.uchicago.edu ci.uchicago.edu FAX update Rob Gardner Computation and Enrico Fermi Institutes University of Chicago Sep 9, 2013.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.
Efi.uchicago.edu ci.uchicago.edu ATLAS Experiment Status Run2 Plans Federation Requirements Ilija Vukotic XRootD UCSD San Diego 27 January,
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX meeting intro and news Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Federated Xrootd.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
Efi.uchicago.edu ci.uchicago.edu FAX Dress Rehearsal Status Report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Efi.uchicago.edu ci.uchicago.edu Status of the FAX federation Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 /
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
FAX PERFORMANCE TIM, Tokyo May PERFORMANCE TIM, TOKYO, MAY 2013ILIJA VUKOTIC 2  Metrics  Data Coverage  Number of users.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.
Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Network integration with PanDA Artem Petrosyan PanDA UTA,
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
ALICE experiences with CASTOR2 Latchezar Betev ALICE.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons Maarten Litmaath On behalf of the WG participants GDB 09/09/2015.
PanDA & Networking Kaushik De Univ. of Texas at Arlington ANSE Workshop, CalTech May 6, 2013.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
EU privacy issue Ilija Vukotic 6 th October 2014.
XRootD Monitoring Report A.Beche D.Giordano. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
Data Distribution Performance Hironori Ito Brookhaven National Laboratory.
DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Efi.uchicago.edu ci.uchicago.edu Federating ATLAS storage using XrootD (FAX) Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation and Enrico Fermi.
Efi.uchicago.edu ci.uchicago.edu Caching FAX accesses Ilija Vukotic ADC TIM - Chicago October 28, 2014.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
a brief summary for users
WLCG IPv6 deployment strategy
Outline Benchmarking in ATLAS Performance scaling
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
FDR readiness & testing plan
Storage elements discovery
Presentation transcript:

efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic

Current Focus: RUCIO, RUCIO, RUCIO Tucson, December 20132

Why RUCIO is important to FAX LFC based gLFN /atlas/dq2/… not scale well RUCIO based gLFN /atlas/rucio/scope:file Tucson, December Ilija V.

RUCIO gLFN improves performance Tucson, December Ilija V.

efi.uchicago.edu ci.uchicago.edu cmsd stability issue we are working on Tucson, December Need to update xrootd and n2n plugin 17 from 25 done Need to update xrootd and n2n plugin 17 from 25 done There will always be some “red” sites. <5% would be satisfactory. FAX is designed to be resilient to site failure Status

efi.uchicago.edu ci.uchicago.edu Status Tucson, December Stability is slowly improving New issues arose due to move to SL6 Some issues giving false red addressed Moving away from LFC will make it even more stable Additional tools helping understand issues on the way

Other Activities Document Improvement – Simplify site storage integration instruction – Updated end user instruction for RUCIO Tools to enhance end-user experience Cost matrix Monitor and access info collection Panda job failover to FAX HC test – Jumpstart HC test for sites with RUCIO N2N Tucson, December 20137

efi.uchicago.edu ci.uchicago.edu Simplify and Enhance End-User Experience Tucson, December localSetupFAX – Very important in improving an end-user experience. Big thanks to Asoka De Silva Sets up dq2-clients and grid middleware (emi-UI for 64-bit OS, gLite for 32-bit OS) Sets up xrootd (you can specify a different version from the default; see --help) Optionally setup ROOT (see --help) setup FAXTools; the tools are in the $PATH environment – isDSinFAX.py, fax_ls.py, setRedirector.sh, more to come determines and sets STORAGEPREFIX – To localredirector, if it can be resolved at the local domain name – If not to the geographically closest working endpoint – If not to glrd.usatlas.org

efi.uchicago.edu ci.uchicago.edu Monitoring and Access Info Collection Tucson, December Most of the monitoring system in place and functioning well New collector deployed at CERN. Now collecting only from INFN-T1. Will collect from all of EU endpoints. xRootD Castor Collector at SLAC xRootD DPM Collector at CERN UDP Monitor xroot4j dCache xRootD posix xRootD MonaLisa ActiveMQ Consumers: Popularity DB SSB Dashboard Consumers: Popularity DB SSB Dashboard Monitor xroot4j dCache Summary stream Detailed stream tests

efi.uchicago.edu ci.uchicago.edu Tucson, December

efi.uchicago.edu ci.uchicago.edu Failover to FAX Tucson, December PanDA sends a job only to a site having all the input data. In case that an input file can not be obtained after 2 tries, the file will be obtained through FAX in case it exists anywhere else Job failure rate are small, but cause significant long turn-around time for users.

efi.uchicago.edu ci.uchicago.edu Failover to FAX Tucson, December For some queues FAX “saves” a significant number of jobs that would otherwise fail. For most of sites percentage of jobs failing due to IO problems small. Failed-over jobs transferred (5 TB) files from FAX and (29 TB) from local storage.

efi.uchicago.edu ci.uchicago.edu Measures transfer rates (memory- to-memory) between 42 ANALY queues and each FAX endpoint Jobs submitted by HammerCloud Jobs send results to ActiveMQ, consumed by dedicated machine stored in SSB with other network performance measurements - perfSONAR and FileTransferService Ilija Vukotic, CHEP October 2013, Amsterdam13 HammerCloud FAX REST SSB FAX cost matrix SSB FAX cost matrix SSB Sonar view SSB Sonar view FTS PanDA job brokerage Cost matrix

efi.uchicago.edu ci.uchicago.edu Tucson, December All the chain is ready. Waiting for Tadashi to add cost into decision making. Cost matrix

efi.uchicago.edu ci.uchicago.edu Compared to FTS Tucson, December An effort to analyze and understand measurements of FAX, FTS and PerfSONAR Jorge, Artem, Ilija 10.9 MB/s 1.1 MB/s 7.5 MB/s 23.4 MB/s FAX test files 100 MB FTS small 1GB

Next Focus: Stress Test (RUCIO only) Tucson, December old gLFNs (LFC)new gLFNs (RUCIO) Sitetime [mm:ss]failurestime [mm:ss]failures AGLT235:18313:140 WT240:00311:063 CERN-PROD56:190not yet in RUCIO MWT230:52010:560 Already doing small scale stress test Sequential list of 744 files Feedback from power users indicate improved stability Next: HC test on sites stability and performance, or FDR again (especially for cross site test) Time line: Jan-Feb 2014? prefer to do this before DC14?

Tucson, December Pre-requirement for Stress test: RUCIO renaming completed RUCIO N2N deployed and functioning Redirector network fully functioning – DE cloud redirector isn’t working well

Beyond the Next Focus Future Site Integration : Develop redundant setup architecture at site level – Increase optimization of site architecture (e.g. proxy) Drop LFC support in N2N Continue Improve on “Other Activities” Tucson, December