Future of WAN Access in ATLAS

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Tier-0: Preparations for Run-2 Armin NAIRZ (CERN) ADC Technical Interchange Meeting Chicago, 29 October 2014.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Data management in grid. Comparative analysis of storage systems in WLCG.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
LHCb input to DM and SM TEGs. Remarks to DM and SM TEGS Introduction m We have already provided some input during our dedicated session of the TEG m Here.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
CERN IT Department CH-1211 Geneva 23 Switzerland GT HTTP solutions for data access, transfer, federation Fabrizio Furano (presenter) on.
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Consistency Checking And RUCIO Progress Update Sarah Williams Indiana University ADC Weekly Meeting,
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
Data Distribution Performance Hironori Ito Brookhaven National Laboratory.
PRIN STOA-LHC: STATUS BARI BOLOGNA-18 GIUGNO 2014 Giorgia MINIELLO G. MAGGI, G. DONVITO, D. Elia INFN Sezione di Bari e Dipartimento Interateneo.
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
UNICORE and Argus integration Krzysztof Benedyczak ICM / UNICORE Security PT.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Federating Data in the ALICE Experiment
a brief summary for users
WLCG IPv6 deployment strategy
Data Management Summary of Experiment Prospective
Status: ATLAS Grid Computing
Computing Operations Roadmap
Update on CERN IT Unified Monitoring Architecture (UMA)
Database Replication and Monitoring
Virtualization and Clouds ATLAS position
ATLAS Grid Information System
Status of the SRM 2.2 MoU extension
U.S. ATLAS Grid Production Experience
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
Academia Sinica Grid Computing Centre
dCache Scientific Cloud
Storage Protocol overview
Taming the protocol zoo
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
The ADC Operations Story
ATLAS Sites Jamboree, CERN January, 2017
GFAL 2.0 Devresse Adrien CERN lcgutil team
Monitoring at a Multi-Site Tier 1
Hironori Ito Brookhaven National Laboratory
ADC Requirements and Recommendations for Sites
WLCG Demonstrator R.Seuster (UVic) 09 November, 2016
Ákos Frohner EGEE'08 September 2008
Data Management cluster summary
Cloud Computing R&D Proposal
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Grid Canada Testbed using HEP applications
Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

Future of WAN Access in ATLAS David Cameron (thanks to Mario Lassnig for most of the input)

Experience WAN: Main use cases so far: A process reading/writing data from a non-local source Outside the “site”, let’s say with traffic travelling through a different administrative area Main use cases so far: Job failover if local input not available Overflow of jobs to empty sites or imbalanced close sites (BNL + MWT2) Object stores for event service Distributed sites (MWT2, NDGF-T1) – but in general network is managed within the site 5 December 2016 US facilities & ADC workshop, BNL

Experience 3rd party transfer In general we transfer too much data Managed replication to fulfil computing model, pre-placement of job input data and consolidation of output date FTS3 is used for almost everything 3 servers at CERN, BNL and RAL Globus Online at US HPCs – not much experience yet In general we transfer too much data Most of it is short-lived job input/output We are using the network like it is “free” 5 December 2016 US facilities & ADC workshop, BNL

What is important for ADC Smooth, efficient and controlled use of networks Not hiding failures Reducing site-specific modifications 5 December 2016 US facilities & ADC workshop, BNL

Short-term roadmap New pilot movers consolidation use rucio-clients everywhere, except ultra-specific cases (symlinks-only & LSM) rucio knows location of all replicas of the experiment (=failover) provides all locations with all available protocols to job/user either via rucio native replica API or via metalink list of replicas can be sorted (now: uniform-random or geoip, in the future: ddm-network-metrics as used by panda & c3p0 already) AGIS settings for site influence replica sorting ({read/write/delete_wan/lan: [protocol1, protocol2, ...]}) All client tools used in the backend of rucio-clients are available via cvmfs & alrb (gfal, lcg-utils, xrdclient, aria2) If source replicas are limited to https in metalink response, can use aria2 for chunked parallel downloads 5 December 2016 US facilities & ADC workshop, BNL

Remote access For now, primarily useful for analysis jobs only Reduce usage of TURLs in TFile::Open() allow ATLAS namespace (scope:name) on the ROOT level TFile::Open('https://rucio/redirect/mc15/hits.0001’) Patch contributed to ROOT I/O by davix devs will return metalink with all protocols that TNetFile can open will try in order to open the remote file, or continue (=failover) Branch caching according to Axel Naumann, the usage of TTreeCache can be greatly improved provide the list of potential branches upfront use accessed branches from first job in task as TTreeCache hint for other jobs 5 December 2016 US facilities & ADC workshop, BNL

3rd Party Transfer Continued evolution of FTS Globus Online integration with Rucio? On ATLAS side review and reduce unnecessary transfers GridFTP will be the standard protocol Others can be evaluated EOS->Castor was tested with xrootd S3 for object stores 5 December 2016 US facilities & ADC workshop, BNL

Longer-term A lot depends on storage evolution And network evolution Fewer, large sites? Greater use of (a few) object stores? Will every computing resource have local storage? And network evolution Will it grow as fast as before? Will it always be “free”? It is clear that storage classes will become more important to exploit cache vs hot disk (eg ceph) vs cold disk vs tape 5 December 2016 US facilities & ADC workshop, BNL

Longer-term SDNs in their current state, primarily useful for third party transfers automatically setup virtual circuits based on file properties (size, link parallelism) Early talks with ESnet people points to many-minute setup times for virtual circuits, cannot be done for ad-hoc transfers To discuss if Rucio or FTS should be the one requesting this 5 December 2016 US facilities & ADC workshop, BNL