PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…

Slides:



Advertisements
Similar presentations
Fighting Malaria With The Grid. Computing on The Grid The Internet allows users to share information across vast geographical distances. Using similar.
Advertisements

Distributed Xrootd Derek Weitzel & Brian Bockelman.
Data & Storage Management TEGs Summary of recommendations Wahid Bhimji, Brian Bockelman, Daniele Bonacorsi, Dirk Duellmann GDB, CERN 18 th April 2012.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.
Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
Your university or experiment logo here Storage and Data Management - Background Jens Jensen, STFC.
Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations / s with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier,
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
Storage and Data Movement at FNAL D. Petravick CHEP 2003.
CERN IT Department CH-1211 Geneva 23 Switzerland GT HTTP solutions for data access, transfer, federation Fabrizio Furano (presenter) on.
Storage Interfaces Introduction Wahid Bhimji University of Edinburgh Based on previous discussions with Working Group: (Brian Bockelman, Simone Campana,
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
The GridPP DIRAC project DIRAC for non-LHC communities.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Storage Interfaces and Access pre-GDB Wahid Bhimji University of Edinburgh On behalf of all those who participated.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Storage Interfaces and Access: Interim report Wahid Bhimji University of Edinburgh On behalf of WG: Brian Bockelman, Philippe Charpentier, Simone Campana,
Wahid Bhimji (Some slides are stolen from Markus Schulz’s presentation to WLCG MB on 19 June Apologies to those who have seen some of this before)
The GridPP DIRAC project DIRAC for non-LHC communities.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Federating Data in the ALICE Experiment
WLCG IPv6 deployment strategy
LHCb distributed computing during the LHC Runs 1,2 and 3
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
StoRM: a SRM solution for disk based storage systems
Future of WAN Access in ATLAS
Data Bridge Solving diverse data access in scientific applications
Methodology: Aspects: cost models, modelling of system, understanding of behaviour & performance, technology evolution, prototyping  Develop prototypes.
Storage Interfaces and Access: Introduction
Academia Sinica Grid Computing Centre
Taming the protocol zoo
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
EGI UMD Storage Software Repository (Mostly former EMI Software)
WLCG Demonstrator R.Seuster (UVic) 09 November, 2016
R. Graciani for LHCb Mumbay, Feb 2006
Presentation transcript:

PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…

Outline Current ‘State-of-the-art’ technology and brief summary of goals of LHC ‘task 4.3’ of proposed CoE: “Exploitation of HPC for [LHC] data-intensive workflows” HPC Workflows for LHC Data Processing (I/O) Data Management (transfer, meta-data, storage …) Focus on the data management aspects Evolution of WLCG storage / data tools (Possible) relevance for LQCD.. but here I’m guessing until the discussion…

LHC Workflows on HPC See Existing work – some recent successessuccesses Mostly focus on simulation (event generation and detector simulation) HPC centre constraints (e.g. network, compute node storage..): Vary between centres and evolving. – ad-hoc solutions. CoE aims: Production services Extend to data- intensive workflows.

Data in LHC - Context WLCG 200 PB data stored on disk Doubling during Run EB data read by ATLAS jobs in 2013 WLCG and experiment tools evolving: Scaling; Simplification; Flexibility

Data Processing (I/O) Nearly all LHC I/O is ROOT I/OROOT A lot of work by ROOT and LHC experiments on reading only parts of what’s needed CoE goals: ROOT IO inc. parallelism Use of caching (see details of http / xrootd later) Data delivery solutions extending existing data structures c.f Atlas ‘event server’ but also beyond… And new hw technologies e.g NVRAM Memristor Phase change memory

Data Management – LHC tools Placed data - still optimum for known data requirements. new tools FTS3, WebFTS Dynamic data ‘federations’ - instead of transferring ahead of time (or in case of broken local replica), exploit improved WAN performance by reading from application over WAN. Present a common namespace across remote sites - without need for a central catalogue. New tools: xrootd-based – now in production, high-performance protocol, http-based: ‘dynamic-federations’, ATLAS Rucio redirector Improved (client) tools: Lcg-utils -> Gfal2 – hides multi-protocol complexityGfal2 Davix – client for performant http(s) I/O Davix Xrootd4 Decreasing LHC use of SRM, LFC (though these will be available for a while)

Data management – CoE goals and LQCD overlaps CoE Goals: ‘Transparent’ interoperation of diverse data stores Based on federation and caching LQCD needs (according to draft CoE text..) Output: 4.4.1: Updated LDG services and tools enabling the users to upload, search and download gauge configurations in a unified, transparent and safe way. Output 4.4.2: Enabling integration of data sharing capabilities into HPC infrastructure Possible use of WebFTS to transfer data from HPC centres With http-based federation and gfal2 client tools for access? Catalogue options (if needed): Rucio (Atlas catalogue) Dirac (LHCb etc.)

Summary The proposed Centre of Excellence would take LHC to routine running on HPC Including data-intensive workflows not currently considered Requires improvements in I/O, caching and data management Probably the evolved WLCG /LHC tools can work for LQCD aims: Overlap between actual CoE activities in exportation of data from HPC centres as well as expertise in these tools.

EXTRAS…

Edinburgh User Edinburgh User I want /atlas/blah/higgs.root Not here (broken) Open root://diskX.cern.ch/ where/blah/higgs.root Read over WAN No need for central catalogue In production use for all LHC experiments. 10

Comparing CERN -> CERN and CERN -> Edinburgh (ECDF) reading (for most extreme IO application ) (with xrootd and http) 11