ANSE: Advanced Network Services for the HEP Community Harvey B Newman California Institute of Technology LHCONE Workshop Paris, June 17, 2013 1 Networking.

Slides:

Advertisements

Similar presentations

Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,

Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

Evolving Data Logistics Needs of the LHC Experiments Paul Sheldon Vanderbilt University.

EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.

The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.

LHCONE Point-to-Point Service Workshop - CERN Geneva Eric Boyd, Internet2 Slides borrowed liberally from Artur, Inder, Richard, and other workshop presenters.

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.

ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.

Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.

Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.

Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton1 Challenging data and workload management in CMS Computing with network-aware.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

1 Networking in the WLCG Facilities Michael Ernst Brookhaven National Laboratory.

Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.

DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.

PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

ANSE: Advanced Network Services for [LHC] Experiments Artur Barczyk, California Institute of Technology for the ANSE team LHCONE Point-to-Point Service.

From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.

US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.

NORDUnet Nordic Infrastructure for Research & Education Workshop Introduction - Finding the Match Lars Fischer LHCONE Workshop CERN, December 2012.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,

WLCG Network and Transfer Metrics WG After One Year Shawn McKee, Marian Babik GDB 4 th November

ANSE: Advanced Network Services for Experiments Shawn McKee / Univ. of Michigan BigPANDA Meeting / UTA September 4, 2013.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

PANDA: Networking Update Kaushik De Univ. of Texas at Arlington SC15 Demo November 18, 2015.

PanDA & BigPanDA Kaushik De Univ. of Texas at Arlington BigPanDA Workshop, CERN October 21, 2013.

Julia Andreeva on behalf of the MND section MND review.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.

ANSE: Advanced Network Services for Experiments Institutes: –Caltech (PI: H. Newman, Co-PI: A. Barczyk) –University of Michigan (Co-PI: S. McKee) –Vanderbilt.

MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.

Network integration with PanDA Artem Petrosyan PanDA UTA,

LHCONE Monitoring Thoughts June 14 th, LHCOPN/LHCONE Meeting Jason Zurawski – Research Liaison.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.

PanDA & Networking Kaushik De Univ. of Texas at Arlington ANSE Workshop, CalTech May 6, 2013.

From the Transatlantic Networking Workshop to the DAM Jamboree David Foster CERN-IT.

ANSE: Advanced Network Services for the HEP Community Harvey B Newman California Institute of Technology Snowmass on the Mississippi Minneapolis, July.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.

LHCONE NETWORK SERVICES INTERFACE (NSI) POINT-TO-POINT TESTBED WITH ATLAS SITES Shawn McKee/Univ. of Michigan Kaushik De/Univ. of Texas Arlington (Thanks.

Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.

PanDA & Networking Kaushik De Univ. of Texas at Arlington UM July 31, 2013.

Federating Data in the ALICE Experiment

Shawn McKee, Marian Babik for the

Computing Operations Roadmap

Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017

Data Challenge with the Grid in ATLAS

PanDA in a Federated Environment

ExaO: Software Defined Data Distribution for Exascale Sciences

Presentation transcript:

ANSE: Advanced Network Services for the HEP Community Harvey B Newman California Institute of Technology LHCONE Workshop Paris, June 17, Networking for the LHC Program

ANSE: Advanced Network Services for Experiments  ANSE is a project funded by NSF’s CC-NIE program  Two years funding, started in January 2013, ~3 FTEs  Collaboration of 4 institutes:  Caltech (CMS)  University of Michigan (ATLAS)  Vanderbilt University (CMS)  University of Texas at Arlington (ATLAS)  Goal: Enable strategic workflow planning including network capacity as well as CPU and storage as a co-scheduled resource  Path Forward: Integrate advanced network-aware tools with the mainstream production workflows of ATLAS and CMS  In-depth monitoring and Network provisioning  Complex workflows: a natural match and a challenge for SDN  Exploit state of the art progress in high throughput long distance data transport, network monitoring and control

ANSE Objectives and Approach  Deterministic, optimized workflows  Use network resource allocation along with storage and CPU resource allocation in planning data and job placement  Use accurate (as much as possible) information about the network to optimize workflows  Improve overall throughput and task times to completion  Integrate advanced network-aware tools in the mainstream production workflows of ATLAS and CMS  Use tools and deployed installations where they exist  Extend functionality of the tools to match experiments’ needs  Identify and develop tools and interfaces where they are missing  Build on several years of invested manpower, tools and ideas (some since the MONARC era, circa )

ANSE - Methodology  Use agile, managed bandwidth for tasks with levels of priority along with CPU and disk storage allocation.  Allows one to define goals for time-to-completion, with reasonable chance of success  Allows one to define metrics of success, such as the rate of work completion, with reasonable resource usage efficiency  Allows one to define and achieve “consistent” workflow  Dynamic circuits a natural match  As in DYNES for Tier2s and Tier3s  Process-Oriented Approach  Measure resource usage and job/task progress in real-time  If resource use or rate of progress is not as requested/planned, diagnose, analyze and decide if and when task replanning is needed  Classes of work: defined by resources required, estimated time to complete, priority, etc.

Tool Categories  Monitoring (Alone):  Allows Reactive Use: React to “events” (State Changes) or Situations in the network  Throughput Measurements  Possible Actions: (1) Raise Alarm and continue (2) Abort/restart transfers (3) Choose different source  Topology (+ Site & Path performance) Monitoring  possible actions: (1) Influence source selection (2) Raise alarm (e.g. extreme cases such as site isolation)  Network Control: Allows Pro-active Use  Reserve Bandwidth Dynamically: prioritize transfers, remote access flows, etc.  Co-scheduling of CPU, Storage and Network resources  Create Custom Topologies  optimize infrastructure to match operational conditions: deadlines, workprofiles  e.g. during LHC running periods vs reconstruction/re-distribution

ATLAS Computing: PanDA  PanDA Workflow Management System (ATLAS): A Unified System for organized production and user analysis jobs  Highly automated; flexible; adaptive  Uses asynchronous Distributed Data Management system  Eminently scalable: to > 1M jobs/day (on many days)  DQ2, Rucio: Register & catalog data; Transfer data to/from sites, delete when done; ensure data consistency; enforce ATLAS computing model  PanDA basic unit of work: A Job  Physics tasks split into jobs by ProdSys layer above PanDA  Automated brokerage based on CPU and Storage resources  Tasks brokered among ATLAS “clouds”  Jobs brokered among sites  Here’s where Network information will be most useful!

ATLAS Production Workflow Kaushik De, Univ. Texas at Arlington

CMS Computing: PhEDEx  PhEDEx is the CMS data-placement management tool  a reliable and scalable dataset (fileblock-level) replication system  With a focus on robustness  Responsible for scheduling the transfer of CMS data across the grid  using FTS, SRM, FTM, or any other transport package  PhEDEx typically queues data in blocks for a given src-dst pair  From tens of TB up to several PB  Success metric so far is the volume of work performed; No explicit tracking of time to completion for specific workflows  Incorporation of the time domain, network awareness, and reactive actions, could increase efficiency for physicists  Natural candidate for using dynamic circuits  could be extended to make use of a dynamic circuit API like NSI

CMS: Possible Approaches within PhEDEx  There are essentially four possible approaches within PhEDEx for booking dynamic circuits:  Do nothing, and let the “fabric” take care of it  Similar to LambdaStation (by Fermilab + Caltech)  Trivial, but lacks prioritization scheme  Not clear the result will be optimal  Book a circuit for each transfer-job i.e. per FDT or gridftp call  Effectively executed below the PhEDEx level  Management and performance optimization not obvious  Book a circuit at each download agent, use it for multiple transfer jobs  Maintain stable circuit for all the transfers on a given src-dst pair  Only local optimization, no global view  Book circuits at the Dataset Level  Maintain a global view, global optimization is possible  Advance reservation From T. Wildish, PhEDEx team lead

PanDA and ANSE Kaushik De (UTA) at Caltech Workshop

FAX Integration with PanDA  ATLAS has developed detailed plans for integrating FAX with PanDA over the past year  Networking plays an important role in Federated storage  This time we are paying attention to networking up front  The most interesting use case – network information used for brokering of distributed analysis jobs to FAX enabled sites  This is first real use case for using external network information in PanDA May 6, 2013Kaushik De Kaushik De (UTA)

PD2P – How LHC Model Changed  PD2P = PanDA Dynamic Data Placement  PD2P is used to distribute data for user analysis  For production PanDA schedules all data flows  Initial ATLAS computing model assumed pre-placed data distribution for user analysis – PanDA sent jobs to data  Soon after LHC data started, we implemented PD2P  Asynchronous usage based data placement  Repeated use of data → make additional copies  Backlog in processing → make additional copies  Rebrokerage of queued jobs → use new data location  Deletion service removes less used data  Basically, T1/T2 storage used as cache for user analysis  This is perfect for network integration  Use network status information for site selection  Provisioning - usually large datasets are transferred, known volume May 6, 2013Kaushik De

ANSE - Relation to DYNES In brief, DYNES is an NSF funded project to deploy a cyberinstrument linking up to 50 US campuses through Internet2 dynamic circuit backbone and regional networks –based on ION service, using OSCARS technology DYNES instrument can be viewed as a production-grade ‘starter-kit’ –comes with a disk server, inter-domain controller (server) and FDT installation –FDT code includes OSCARS IDC API  reserves bandwidth, and moves data through the created circuit “Bandwidth on Demand”, i.e. get it now or never routed GPN as fallback The DYNES system is naturally capable of advance reservation We need the right agent code inside CMS/ATLAS to call the API whenever transfers involve two DYNES sites

DYNES Sites 14 DYNES is ramping up to full scale, and will transition to routine Operations in 2013 DYNES is extending circuit capabilities to ~40-50 US campuses Will be an integral part of the point-to-point service in LHCONE

ANSE Current Activities  Candidate initial sites: UMich, UTA, Caltech, Vanderbilt, UVIC  Monitoring information for workflow and transfer management  Define path characteristics to be provided to FAX and PhEDEx  On a NxN mesh of source/destination pairs  use information gathered in perfSONAR  Use LISA agents to gather end-system information  Dynamic Circuit Systems  Working with DYNES at the outset  monitoring dashboard, full-mesh connection setup and BW test  Deploy a prototype PhEDEx instance for development and evaluation  integration with network services  Potentially use LISA agents for pro-active end-system configuration

ATLAS Next Steps  UTA and Michigan are focusing on getting estimated bandwidth along specific paths available for PanDA (and PhEDEx) use.  Site A to SiteB; NxN mesh  Michigan  Michigan (Shawn McKee, Jorge Batista) is working with the perfSONAR-PS metrics which are gathered at OSG and WLCG sites.  The perfSONAR-PS toolkit instances provide a web services interface open for anyone to query, but we would rather use the centrally collected data that OSG and WLCG will be gathering.  The “new” modular dashboard project ( has a collector, a datastore and a GUI.  Batista is working on creating an estimated bandwidth matrix for WLCG sites querying the data in the new dashboard datastore.  Also extending the datastore to gather/filter traceroute from perfSONAR Shawn McKee Michigan

ATLAS Next Steps: University of Texas at Arlington (UTA)  UTA is working on PanDA network integration.  Artem Petrosyan, a new hire with experience in the PanDA team before coming to UTA is preparing a web page with the best sites in each cloud for PanDA.  He will collect data from existing ATLAS SONAR (data transfer) tests and load them into the AGIS (ATLAS information) system.  Needs to implement the schema for storing the data appropriately  Next he will create metadata for statistical analysis & prediction  He will then build a web UI to present the best destinations for job and data sources.  In parallel with AGIS and web UI work, Artem will be working with PanDA to determine the best locations to access and use the information he gets from SONAR as well as the information Jorge gets from perfSONAR-PS.  Also coordinated with the BigPanDA (OASCR) project at BNL (Yu)

CMS PhEDEx Next Steps  Prototype installation of PhEDEx for ANSE/LHCONE  Use Testbed Oracle instance at CERN  Install website (need a box to put it on)  Install site agents at participating sites  One box per site, (preferably) at the site  Possible to run site-agents remotely, but need remote access to SE to check if file transfers succeeded/delete old files  Install management agents  One box somewhere, anywhere – could even be CERN  For a few sites, could all be run by 1 person  Will provide a fully-featured system for R&D  Can add nodes as/when other sites want to participate Tony Wildish Princeton

PhEDEx & ANSE, Next Steps Caltech Workshop May 6, 2013

PhEDEx & ANSE, next steps Caltech Workshop May 6, 2013

ANSE: Summary and Conclusions  The ANSE project will integrate advanced network services with the LHC Experiments’ mainstream production SW stacks  Through interfaces to  Monitoring services (PerfSONAR-based, MonALISA)  Bandwidth reservation systems (NSI, IDCP, DRAC, etc.)  By implementing network-based task and job workflow decisions (cloud and site selection; job & data placement) in  PanDA system in ATLAS  PhEDEx in CMS  The goal is to make deterministic workflows possible  Increasing the working efficiency of the LHC Discovery program  Profound implications for future networks and other fields  A challenge and an opportunity for Software Defined Networks  With new scale and scope: optimization of global systems

THANK YOU ! QUESTIONS ?

BACKUP SLIDES FOLLOW

Components for a Working System: Dynamic Circuit Infrastructure To be useful for the LHC and other communities, it is essential to build on current & emerging standards, deployed on a global scale Jerry Sobieski, NORDUnet

Components for a working system: In-Depth Monitoring  Monitoring: PerfSONAR and MonALISA  All LHCOPN and many LHCONE sites have PerfSONAR deployed  Goal is to have all LHCONE instrumented for PerfSONAR measurement  Regularly scheduled tests between configured pairs of end-points:  Latency (one way)  Bandwidth  Currently used to construct a dashboard  Could provide input to algorithms developed in ANSE for PhEDEx and PanDA  ALICE and CMS experiments are using the MonALISA monitoring framework  Real time  Accurate bandwidth availability  Complete topology view  Higher level services

Fast Data Transfer (FDT) DYNES instrument includes a storage element, FDT as transfer application FDT is an open source Java application for efficient data transfers Easy to use: similar syntax with SCP, iperf/netperf Based on an asynchronous, multithreaded system Uses the New I/O (NIO) interface and is able to: –stream continuously a list of files –use independent threads to read and write on each physical device –transfer data in parallel on multiple TCP streams, when necessary –use appropriate size of buffers for disk IO and networking –resume a file transfer session 26 FDT uses IDC API to request dynamic circuit connections

DYNES/FDT/PhEDEx FDT integrates OSCARS IDC API to reserve network capacity for data transfers FDT has been integrated with PhEDEx at the level of download agent Basic functionality OK –more work needed to understand performance issues with HDFS Interested sites are welcome to test With FDT deployed as part of DYNES, this makes one possible entry point for ANSE

DYNES High-level topology The DYNES Instrument