Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton1 Challenging data and workload management in CMS Computing with network-aware.

Slides:



Advertisements
Similar presentations
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
ALICE data access WLCG data WG revival 4 October 2013.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
ANSE: Advanced Network Services for the HEP Community Harvey B Newman California Institute of Technology LHCONE Workshop Paris, June 17, Networking.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
TELE202 Lecture 5 Packet switching in WAN 1 Lecturer Dr Z. Huang Overview ¥Last Lectures »C programming »Source: ¥This Lecture »Packet switching in Wide.
Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
1 Networking in the WLCG Facilities Michael Ernst Brookhaven National Laboratory.
The LHC Computing Models and Concepts Continue to Evolve Rapidly and Need To How to Respond ?
Update on replica management
ANSE: Advanced Network Services for [LHC] Experiments Artur Barczyk, California Institute of Technology for the ANSE team LHCONE Point-to-Point Service.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.
8-Dec-15T.Wildish / Princeton1 CMS analytics A proposal for a pilot project CMS Analytics.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
ANSE: Advanced Network Services for Experiments Shawn McKee / Univ. of Michigan BigPANDA Meeting / UTA September 4, 2013.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
ATLAS Network Requirements – an Open Discussion ATLAS Distributed Computing Technical Interchange Meeting University of Tokyo.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.
ANSE: Advanced Network Services for Experiments Institutes: –Caltech (PI: H. Newman, Co-PI: A. Barczyk) –University of Michigan (Co-PI: S. McKee) –Vanderbilt.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.
Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)
Dr Tim Smith CERN/IT For the visit of the Alliance of German Science Organizations.
ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.
From the Transatlantic Networking Workshop to the DAM Jamboree David Foster CERN-IT.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
XRootD Monitoring Report A.Beche D.Giordano. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage.
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
William Stallings Data and Computer Communications
Daniele Bonacorsi Andrea Sciabà
ALICE internal and external network
Evolution of the distributed computing model The case of CMS
Network Requirements Javier Orellana
Cache Memory Presentation I
Ákos Frohner EGEE'08 September 2008
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
ExaO: Software Defined Data Distribution for Exascale Sciences
Enterprise Class Virtual Tape Libraries
Presentation transcript:

Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton1 Challenging data and workload management in CMS Computing with network-aware systems D.Bonacorsi, T.Wildish

Challenging CMS Computing with network-aware systems How can we make better use of the network to improve analysis throughput? A bit of history Present day reality Consequences of our reality Challenges (a.k.a problems) Solutions CHEP 2013T.Wildish / Princeton2

Challenging CMS Computing with network-aware systems PhEDEx is the CMS data-placement management tool 1 controls bulk data-flows –Basic architecture is ten years old –Routing and retry algorithms rather TCP-like Keep network full, prefer to send stuff that flows easily Rapid back-off, gentle retries ‘Guaranteed’ delivery, but no concept of deadlines –Prioritisation based on simple queue-ordering –Slow to switch sources, assume good choices will remain, on average, good choices CHEP 2013T.Wildish / Princeton3 1 The CMS Data Management System, N.Magini

Challenging CMS Computing with network-aware systems Original computing model design decisions –Did not trust the network Assumed it would be the greatest source of failure –Expected slow links, ~100 Mb/sec Expect bandwidth to be a scarce resource Optimise for efficient transfer of files, instead of datasets –Move jobs to data, not data to jobs –MONARC model Strictly hierarchical, no T2->T2 or T2->T0 Minimal number of links to manage (~100) CHEP 2013T.Wildish / Princeton4

Challenging CMS Computing with network-aware systems Today –Network is most reliable component Most errors at SE level or within a site, not between sites –10 Gb/sec or more on many links Bandwidth is abundant We are far from saturating most links –Strict hierarchy relaxed, to say the least Almost fully connected mesh T2->T2 transfers significant CHEP 2013T.Wildish / Princeton5

Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton6

Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton7

Challenging CMS Computing with network-aware systems 8T.Wildish / PrincetonCHEP 2013 T0 -> T1T2 -> T1 T1 -> T2T2 -> T2

Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton9 Average rate last year ProductionDebugTotal T0 -> T1230 MB/sec100 MB/sec330 MB/sec T2 -> T T1 -> T T2 -> T Total Production instance is real data Debug instance is for commissioning and link-tests - 1/3 of total traffic is for knowledge of network state Average rate ~ 2 GB/sec CMS-wide - sustained over last 3 years. - not b/w limited

Challenging CMS Computing with network-aware systems PhEDEx routing and retry algorithms do not make optimal use of network –Back-off and retry same link in case of failure, could fail-over to another source node or to AAA MONARC model largely displaced 1 –Much traffic between T2’s, T3’s, without T1 intermediaries –We don’t have a model of this Don’t know why all these movements are requested Don’t know which movements are correlated CHEP 2013T.Wildish / Princeton10 1 CMS Computing Model Evolution, C.Grandi

Challenging CMS Computing with network-aware systems Current metrics –Time to transfer a dataset PhEDEx-internal statistics, based on past performance –Time to complete a set of batch jobs –Amount of data on a site Group quotas, number of replicas These are all dataflow or workflow centric –Related to use of hardware, not analysis throughput –Not easy to model analysis throughput! CHEP 2013T.Wildish / Princeton11

Challenging CMS Computing with network-aware systems So back to the question: How do we make better use of the network to improve analysis throughput? Near-term improvements –AAA - Any data, Any time, Anywhere 1,2 Analysis fallback to xrootd/WAN access on failure –Data-popularity service Identify idle data, automatically clean them Identify hot data, automatically replicate them CHEP 2013T.Wildish / Princeton12 1 CMS use of a data-federation, K.Bloom 2 Xrootd, disk-based, caching-proxy for optimization of data-access, data-placement, and data-replication, M.Tadel

Challenging CMS Computing with network-aware systems Longer term? –More dynamic and responsive data placement –Reduce latencies in PhEDEx (30 mins => 1 min) Rapid switching of source replicas on failure Emphasise block-completion over raw throughput –External sources of network-state (PerfSONAR, MonaLISA) Awareness of overall network state, not just internal –Manage the network explicitly Schedule use of bandwidth across PhEDEx, CMS, … Reserve network resources, virtual circuits CHEP 2013T.Wildish / Princeton13

Challenging CMS Computing with network-aware systems New possibilities –Improve knowledge of delivery-time for bulk data Deterministic prediction of completion time Co-schedule jobs with data placement –Respond to real network conditions, beyond the VO –JIT replication of datasets Replicate with PhEDEx in response to traffic on AAA Co-ordinate between PhEDEx & AAA on bandwidth use –Opportunistic storage/computing resources Pump data in fast, use CPU, pump data out fast Increases usability of non-CMS-controlled resources CHEP 2013T.Wildish / Princeton14

Challenging CMS Computing with network-aware systems How? –Dual role of PhEDEx: data-store & data-movement Refactor into CMS-managed data-store and a CDN –Data-store Static repository of data. i.e. CMS-managed sites Dynamic creation/deletion of sites, from SE’s to single machines, anything with a few TB of disk Accept user-defined datasets, not just official data –CDN Rapid movement of small volumes of data (~1TB) Book network circuit for performance, determinism CHEP 2013T.Wildish / Princeton15

Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton16

Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton17

Challenging CMS Computing with network-aware systems Virtual circuits, PhEDEx and ANSE 1 –Several points of integration into PhEDEx Per transfer-job: already exists with FDT Per-destination, a circuit for all files from a given source CMS-wide, book and manage circuits centrally –ANSE Integrate network awareness into the software stacks of CMS (PhEDEx) - and ATLAS (PanDA) Build on top of existing services (e.g. DYNES 2 ) Part of LHCONE point-to-point demonstration CHEP 2013T.Wildish / Princeton18 1 Integrating the Network into LHC Experiments: Update on the ANSE project, A.Melo 2 Application Performance Evaluation and Recommendations for the DYNES, S.McKee

Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton19 Existing per-transfer circuits with FDT in in PhEDEx

Challenging CMS Computing with network-aware systems What’s missing? –Network management in CMS Network awareness/management in PhEDEx -> ANSE Awareness and management for AAA or other activities? Policy for use of circuits –When is it worth/not worth using a circuit? –API, concept of network budget, advance reservation…? –Robust virtual circuits across the CMS VO Multi-domain, many players, much work-in-progress 1 –Integration, coordination of network/CPU/storage Coherent use of network resources (paths, circuit duration, occupancy etc) CHEP 2013T.Wildish / Princeton20 1 Advanced Networking for HEP, Research and Education in the LHC Era, H.Newman

Challenging CMS Computing with network-aware systems Summary –A more responsive, deterministic, PhEDEx opens possibilities for CMS Rapid replication can improve efficiency in many ways Extend data-management to broader range of resources –Network awareness/control takes this to a new level Knowing the network state allows smart routing decisions Control of network allows greater flexibility and responsiveness, more determinism Co-scheduling of CPU/network/storage Lots of work to do, from the network fabric through the experiment applications to the management policies CHEP 2013T.Wildish / Princeton21