PhEDEx: a novel approach to robust Grid data management Tim Barrass Dave Newbold and Lassi Tuura All Hands Meeting, Nottingham, UK 22 September 2005.

Slides:

Advertisements

Similar presentations

International Grid Communities Dr. Carl Kesselman Information Sciences Institute University of Southern California.

Advertisements

Artworld Paul Child Project Manager Artworld

S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.

1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.

CMS Applications – Status and Near Future Plans

1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006.

The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.

OptorSim: A Replica Optimisation Simulator for the EU DataGrid W. H. Bell, D. G. Cameron, R. Carvajal, A. P. Millar, C.Nicholson, K. Stockinger, F. Zini.

Your university or experiment logo here What is it? What is it for? The Grid.

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.

The LHC experiments AuthZ Interoperation requirements GGF16, Athens 16 February 2006 David Kelsey CCLRC/RAL, UK

HEPiX Edinburgh 28 May 2004 LCG les robertson - cern-it-1 Data Management Service Challenge Scope Networking, file transfer, data management Storage management.

© Copyright 2010 Lead Paint Safety Association

Peter Berrisford RAL – Data Management Group SRB Services.

Computing for LHC Dr. Wolfgang von Rüden, CERN, Geneva ISEF students visit CERN, 28 th June - 1 st July 2009.

What the Search Engines are up to Now: same ingredients different recipes Karen Blakeman RBA Information Services, UK 02 June 20141Karen Blakeman

Welcome to Middleware Joseph Amrithraj

OneBridge Mobile Data Suite Product Positioning. Target Plays IT-driven enterprise mobility initiatives Extensive support for integration into existing.

Pitching for finance Social Enterprise North West February 2014.

Good Salespeople johnpc ltd: John Cunningham.

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.

High Performance Computing Course Notes Grid Computing.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Other servers Java client, ROOT (analysis tool), IGUANA (CMS viz. tool), ROOT-CAVES client (analysis sharing tool), … any app that can make XML-RPC/SOAP.

POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.

GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.

Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.

José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

CMS Report – GridPP Collaboration Meeting V Peter Hobson, Brunel University16/9/2002 CMS Status and Plans Progress towards GridPP milestones Workload management.

GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.

1 Grid Related Activities at Caltech Koen Holtman Caltech/CMS PPDG meeting, Argonne July 13-14, 2000.

Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.

10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.

Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.

Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.

…building the next IT revolution From Web to Grid…

Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.

CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.

ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.

Grid Technologies for Distributed Database Services 3D Project Meeting CERN, May 19, 2005 A. Vaniachine (ANL)

DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

Dave Newbold, University of Bristol14/8/2001 Testbed 1 What is it? First deployment of DataGrid middleware tools The place where we find out if it all.

IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.

DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.

Data Challenge with the Grid in ATLAS

Principles of Network Applications

DOE Facilities - Drivers for Science: Experimental and Simulation Data

CERN, the LHC and the Grid

ExaO: Software Defined Data Distribution for Exascale Sciences

Presentation transcript:

PhEDEx: a novel approach to robust Grid data management Tim Barrass Dave Newbold and Lassi Tuura All Hands Meeting, Nottingham, UK 22 September 2005

Tim Barrass, Bristol, What is PhEDEx? A data distribution management system Used by the Compact Muon Solenoid (CMS) High Energy Physics (HEP) experiment at CERN, Geneva Blends traditional HEP data distribution practice with more recent technologies Grid and peer-to-peer filesharing Scalable infrastructure for managing dataset replication Automates low-level activity Allows manager to work with high level dataset concepts rather than low level file operations Technology agnostic Overlies Grid components Currently couples LCG, OSG, NorduGrid, standalone sites

Tim Barrass, Bristol, The HEP environment HEP collaborations are quite large Order of 1000 collaborators, globally distributed CMS is only one of four Large Hadron Collider (LHC) experiments being built at CERN Typically resources are globally distributed Resources organised in tiers of decreasing capacity Tier 0: the detector facility Tier 1: large regional centres Tier 2+: smaller sites-- Universities, groups, individuals… Raw data partitioned between sites, highly processed ready-for-analysis data available everywhere LHC computing demands are large Order 10 PetaBytes per year created for CMS alone Similar order simulated Also analysis and user data

Tim Barrass, Bristol, CMS distribution use cases Two principle use cases- push and pull of data Raw data is pushed onto the regional centres Simulated and analysis data is pulled to a subscribing site Actual transfers are 3rd party- handshake between active components important, not push or pull Maintain end-to-end multi-hop transfer state Can only clean online buffers at detector when data safe at Tier 1 Policy must be used to resolve these two use cases

Tim Barrass, Bristol, PhEDEx design Assume every operation is going to fail! Keep complex functionality in discrete agents Handover between agents minimal Agents are persistent, autonomous, stateless, distributed System state maintained using a modified blackboard architecture Layered abstractions make system robust Keep local information local where possible Enable site administrators to maintain local infrastructure Robust in face of most local changes Deletion and accidental loss require attention Draws inspiration from agent systems, autonomic and peer-to-peer computing

Tim Barrass, Bristol, Transfer workflow overview

Tim Barrass, Bristol, Production performance

Tim Barrass, Bristol, Service challenge performance

Tim Barrass, Bristol, Future directions Contractual file routing Cost-based offers for a given transfer Peer-to-peer data location Using Kademlia to partition replica location information Semi-autonomy Agents governed by many small tuning parameters Self modify- or use more intelligent protocols? Advanced policies for priority conflict resolution Need to ensure that raw data is always flowing Difficult real-time scheduling problem

Tim Barrass, Bristol, Summary PhEDEx enables dataset level replication for the CMS HEP experiment Currently manages 200TB+ of data, globally distributed Real life performance of 1 TB per day sustained per site Challenge performance of over 10TB per day Not CMS-- or indeed HEP-- specific Well-placed to meet future challenges Ramping up to get to O(10)PB per year TB per day Data starts flowing for real in the next two years

Tim Barrass, Bristol, Extra information PhEDEx and CMS : feel free to subscribe! CMS Computing model Agent frameworks JADE DiaMONDs FIPA Peer-to-peer Kademlia Kenosis Autonomic computing General agents and blackboards Where should complexity go? Agents and blackboards

Tim Barrass, Bristol, Issues Most issues fabric-related Most low level components experimental or not production-hardened Tools typically unreliable under load MSS access a serious handicap PhEDEx plays very fair, keeping within request limits and ordering requests by tape when possible Main problem is keeping in touch with the O(3) people at each site involved in deploying fabric, administration &c

Tim Barrass, Bristol, Deployment 8 regional centres, 16 smaller sites 110TB, replicated ~twice 1 TB per day sustained On standard Internet

Tim Barrass, Bristol, Testing and scalability

Tim Barrass, Bristol, PhEDEx architecture