Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017 edoardo.martelli@cern.ch.

Slides:

Advertisements

Similar presentations

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.

Advertisements

Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.

GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.

NORDUnet NORDUnet The Fibre Generation Lars Fischer CTO NORDUnet.

ALICE data access WLCG data WG revival 4 October 2013.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.

Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.

6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.

U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.

CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.

ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.

WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.

LHCOPN / LHCONE Status Update John Shade /CERN IT-CS Summary of the LHCOPN/LHCONE meeting in Amsterdam Grid Deployment Board, October 2011.

HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.

Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April

Daniele Bonacorsi Andrea Sciabà

Dynamic Extension of the INFN Tier-1 on external resources

WLCG IPv6 deployment strategy

LHCb distributed computing during the LHC Runs 1,2 and 3

Laurence Field IT/SDC Cloud Activity Coordination

WLCG Workshop 2017 [Manchester] Operations Session Summary

LHCOPN/LHCONE status report pre-GDB on Networking CERN, Switzerland 10th January 2017

WLCG Network Discussion

Ian Bird WLCG Workshop San Francisco, 8th October 2016

ALICE internal and external network

Computing models, facilities, distributed computing

Overview of the Belle II computing

LHCOPN update Brookhaven, 4th of April 2017

Semester 4 - Chapter 3 – WAN Design

POW MND section.

Data Challenge with the Grid in ATLAS

One independent ‘policy-bridge’ PKI

Grid related projects CERN openlab LCG EDG F.Fluckiger

INFNGRID Workshop – Bari, Italy, October 2004

Update on Plan for KISTI-GSDC

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

The INFN TIER1 Regional Centre

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

UK Status and Plans Scientific Computing Forum 27th Oct 2017

Update from the HEPiX IPv6 WG

3rd Asia Tier Centre Forum Summary report LHCONE meeting in Tsukuba 17th October 2017

WP7 objectives, achievements and plans

Monitoring at a Multi-Site Tier 1

System performance and cost model working group

Simulation use cases for T2 in ALICE

Alerting/Notifications (MadAlert)

The INFN Tier-1 Storage Implementation

CernVM Status Report Predrag Buncic (CERN/PH-SFT).

WLCG and support for IPv6-only CPU

Bernd Panzer-Steindel CERN/IT

February WLC GDB Short summaries

WLCG Collaboration Workshop;

New strategies of the LHC experiments to meet

Grid Canada Testbed using HEP applications

ExaO: Software Defined Data Distribution for Exascale Sciences

Grid Computing 6th FCPPL Workshop

WLCG Status – 1 Use remains consistently high

Development of LHCb Computing Model F Harris

The LHC Computing Grid Visit of Professor Andreas Demetriou

Presentation transcript:

Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017 edoardo.martelli@cern.ch

WLCG Network Requirements The 4 major LHC experiments were asked to provide their network requirements for the coming years, in terms of: - bandwidth - special capabilities - monitoring

ALICE Recommendations for Tier2s: - WAN: 100Mbps of WAN per 1000 cores - LAN to local storage: 20Gbps per 1000 cores - Better read data locally: CPU efficiency get a 15% penalty per 20ms RTT ALICE depends on and strongly encourages: - full implementation of LHCONE at all sites - plus the associated tools (like PerfSONAR), properly instrumented to be used in the individual Grid frameworks

ATLAS ATLAS moving to a Nucleus and Satellite Model Nucleus will store primary data and act as a source for data distribution: - Storage capacity > 1PB - Good Network throughput - Site availability: > 95% ATLAS would like - better visibility into networks to improve ability to resolve network issues - to see network traffic data from R&E Networks

CMS Site types and requirements: - Full Tier-2, many CPUs and large disk capacity: Some 10Gbit/s or 100Gbit/s for both LAN and WAN are advisable for sites with several 1000 cores - Disk-rich Tier-2, more storage than average, perhaps hosting disk for co-located CPU only site: Good WAN connection even more important CMS schedules typically 10-15% of jobs with remote data access - Penalty in CPU efficiency: drop of ~10% for remote access - But large gain in flexibility Computing model likely to evolve towards scenarios that require fast interconnects Need to improve CMS transfer system and scheduling system to better exploit network metrics

LHCb - Majority of workflows are simulation jobs without input data - A certain fraction are user jobs with local access to input data - Currently small usage of working group productions. Will increase - In case of input data, always read by default from local site - Output data always goes to different storage areas on T1 sites Monitoring, including network monitoring, available from within Dirac: - for “helper data processing” sites to check the WAN connectivity to a T1 storage - for data management to check WAN quality

Network throughput WG WG has established an infrastructure to monitor and measure networks: - proven record on fixing existing network problems and improving transfer efficiency - stable production infrastructure Mid-term evolution topics: - Network capacities planning - Network utilization monitoring, both site-level and WAN - Evolving and integration of monitoring data: new sources, dashboards and network stream - Network Analytics: Alerting/Notifications and anomaly detection - SDN Networking demonstrators and testbeds

GEANT GEANT is investigating ways to increase ability to differentiate, break dependence on vendors, and provide capabilities to automate/orchestrate multi-domain services Asked the Experiments if interested in participating

From the discussions Need to maintain, monitor and develop network monitoring; effort is required to: - respond to GGUS tickets - track service and metric status and follow up on down services, mis-configuration, etc. - evolve the system, implementing new user interfaces, analytics, alerting and new measurements - coordinate network activities with others Some interest has been expressed about network programmability, especially in light of foreseen commercially driven network developments.

References Presentations available here: https://indico.cern.ch/event/609911/sessions/238341/#20170620

Questions? edoardo.martelli@cern.ch