1 Networking in the WLCG Facilities Michael Ernst Brookhaven National Laboratory.

Slides:



Advertisements
Similar presentations
Circuit Monitoring July 16 th 2011, OGF 32: NMC-WG Jason Zurawski, Internet2 Research Liaison.
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Dynamically Provisioned Networks as a Substrate for Science David Foster CERN.
The Future of GÉANT: The Future Internet is Present in Europe Vasilis Maglaris Professor of Electrical & Computer Engineering, NTUA Chairman, NREN Policy.
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
Trial of the Infinera PXM Guy Roberts, Mian Usman.
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
Abstraction and Control of Transport Networks (ACTN) BoF
Assessment of Core Services provided to USLHC by OSG.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
IRNC Special Projects: IRIS and DyGIR Eric Boyd, Internet2 October 5, 2011.
NORDUnet NORDUnet The Fibre Generation Lars Fischer CTO NORDUnet.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
LHC Open Network Environment LHCONE Artur Barczyk California Institute of Technology LISHEP Workshop on the LHC Rio de Janeiro, July 9 th,
Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.
LHC Open Network Environment LHCONE David Foster CERN IT LCG OB 30th September
Scientific Networking: The Cause of and Solution to All Problems April 14 th Workshop on High Performance Applications of Cloud and Grid Tools Jason.
Connect communicate collaborate GÉANT3 Services Connectivity and Monitoring Services by and for NRENs Ann Harding, SWITCH TNC 2010.
ASCR/ESnet Network Requirements an Internet2 Perspective 2009 ASCR/ESnet Network Requirements Workshop April 15/16, 2009 Richard Carlson -- Internet2.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
SDN AND OPENFLOW SPECIFICATION SPEAKER: HSUAN-LING WENG DATE: 2014/11/18.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
ANSE: Advanced Network Services for [LHC] Experiments Artur Barczyk, California Institute of Technology for the ANSE team LHCONE Point-to-Point Service.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
LHC OPEN NETWORK ENVIRONMENT STATUS UPDATE Artur Barczyk/Caltech Tokyo, May 2013 May 14, 2013
From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.
NORDUnet Nordic Infrastructure for Research & Education Workshop Introduction - Finding the Match Lars Fischer LHCONE Workshop CERN, December 2012.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
LHC Open Network Environment Architecture Overview and Status Artur Barczyk/Caltech LHCONE meeting Amsterdam, September 26 th,
Connect communicate collaborate LHCONE moving forward Roberto Sabatino, Mian Usman DANTE LHCONE technical workshop SARA, 1-2 December 2011.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
ATLAS Network Requirements – an Open Discussion ATLAS Distributed Computing Technical Interchange Meeting University of Tokyo.
Internet2 Joint Techs Workshop, Feb 15, 2005, Salt Lake City, Utah ESnet On-Demand Secure Circuits and Advance Reservation System (OSCARS) Chin Guok
Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.
1 WLCG Network WG (WNWG) - Mandate, Scope, Membership … Michael Ernst Brookhaven National Laboratory.
ANSE: Advanced Network Services for Experiments Institutes: –Caltech (PI: H. Newman, Co-PI: A. Barczyk) –University of Michigan (Co-PI: S. McKee) –Vanderbilt.
NORDUnet NORDUnet e-Infrastrucure: Grids and Hybrid Networks Lars Fischer CTO, NORDUnet Fall 2006 Internet2 Member Meeting, Chicago.
DICE: Authorizing Dynamic Networks for VOs Jeff W. Boote Senior Network Software Engineer, Internet2 Cándido Rodríguez Montes RedIRIS TNC2009 Malaga, Spain.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
From the Transatlantic Networking Workshop to the DAM Jamboree David Foster CERN-IT.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
IPCEI on High performance computing and big data enabled application: a pilot for the European Data Infrastructure Antonio Zoccoli INFN & University of.
ESnet’s Use of OpenFlow To Facilitate Science Data Mobility Chin Guok Inder Monga, and Eric Pouyoul OGF 36 OpenFlow Workshop Chicago, Il Oct 8, 2012.
100GE Upgrades at FNAL Phil DeMar; Andrey Bobyshev CHEP 2015 April 14, 2015.
Brookhaven Science Associates U.S. Department of Energy 1 n BNL –8 OSCARS provisioned circuits for ATLAS. Includes CERN primary and secondary to LHCNET,
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
LHCOPN / LHCONE Status Update John Shade /CERN IT-CS Summary of the LHCOPN/LHCONE meeting in Amsterdam Grid Deployment Board, October 2011.
Communication Needs in Agile Computing Environments Michael Ernst, BNL ATLAS Distributed Computing Technical Interchange Meeting University of Tokyo May.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.
Bob Jones EGEE Technical Director
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Multi-layer software defined networking in GÉANT
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.
Integration of Network Services Interface version 2 with the JUNOS Space SDK
LHC Open Network Project status and site involvement
The UltraLight Program
Presentation transcript:

1 Networking in the WLCG Facilities Michael Ernst Brookhaven National Laboratory

2 WLCG Facilities (1/2) Today in WLCG our primary concern regarding resources is CPU and storage capacities –All LHC experiments are increasingly relying on networks to provide connectivity between distributed facility elements across all levels E.g. ATLAS uses today ~50% of T2 storage for primary datasets –Wide Area Network infrastructure is complex Besides the (mostly) predictable service of the OPN the vast majority of traffic (i.e. for ATLAS and CMS) has been using the GPN With LHCONE there is a 3 rd piece of network infrastructure –Meant to serve T1  T2/T3, T2  T2, T2  T3 needs

3 WLCG Facilities (2/2) With evolving computing models traffic flows change significantly (e.g. due to vanishing T1/T2 differences) With the changes in the ATLAS CM, what would not work given the current network infrastructure? The LHC community should conduct a study to determine what kind of networks and how much networking is necessary – today and in the next ~5 years – Key questions include –Do we need either LHCOPN or LHCONE if there was a sufficiently provisioned commodity IP network? »All commercial and R&E providers have 100 Gbps backbones deployed »Issue is typically the “last mile” –Do we need LHCOPN if we have a globally funded LHCONE private IP network? –Do we need LHCONE given we have the LHCOPN and today’s commodity IP network with increased capacities? »Who will be paying for LHCONE circuits in the long-term?

4 WLCG Facilities (2/2) With evolving computing models traffic flows change significantly (e.g. due to vanishing T1/T2 differences) With the changes in the ATLAS CM, what would not work given the current network infrastructure? The LHC community should conduct a study to determine what kind of networks and how much networking is necessary – today and in the next ~5 years – Key questions include –Do we need either LHCOPN or LHCONE if there was a sufficiently provisioned commodity IP network? »All commercial and R&E providers have 100 Gbps backbones deployed »Issue is typically the “last mile” –Do we need LHCOPN if we have a globally funded LHCONE private IP network? –Do we need LHCONE given we have the LHCOPN and today’s commodity IP network with increased capacities? »Who will be paying for LHCONE circuits in the long-term? LHCOPN

CM Changes: Data Distribution - Moving from push to pull model We realized that, while disks filled up quickly, only a small fraction of the data that was programmatically subscribed to sites was actually used by analysis jobs Since July 2010: Workload management’s dynamic data placement component (PD2P) sends data to Tier-2s for analysis on the basis of usage  Requires excellent networks Start of 7 TeV Data Taking 1350 TB In 2 weeks MB/s Throughput by Activity before and after Dynamic data placement (PD2P) turned on Predominant ‘Data Consolidation’ component (programmatic replication of ESD) vastly reduced after activation of PD2P T0 Export (blue) is sent from CERN to BNL T1 only Dynamic Data Placement Data Pre-Placement

CM Changes: Data Placement – Using T2 disk to store primary datasets (not just as cache) T1 Datadisk T2 Datadisk Primary Datasets  Requires decent network connectivity and performance

CM Changes: Federated Data Stores Common ATLAS namespace across all storage sites, accessible from anywhere Easy to use, homogeneous access to data Use as failover for existing systems Gain access to more CPUs using WAN direct read access Use as caching mechanism at sites to reduce local data management tasks  Requires excellent networking Much more in Rob Gardner’s session

Regional and Global Connectivity (only a subset is shown)

9 FNAL (AS3152) [US] ESnet (AS293) [US] GEANT (AS20965) [Europe] DFN (AS680) [Germany] DESY (AS1754) [Germany] measurement archive m1 m4 m3 measurement archive m1 m4 m3 measurement archive m1 m4 m3 m1 m4 m3 m1 m4 m3 measurement archive performance GUI user Analysis tool Collaborations in WLCG are inherently multi-domain, so for an end-to-end monitoring tool to work everyone must participate in the monitoring infrastructure  Deployment being coordinated by WLCG Operations Coordination Team BNL Network Monitoring w/ perfSONAR (Shawn’s Talk)

LHCONE – A new Network Infrastructure for HEP LHCONE was created to address two main issues: –To ensure that the services to the science community maintain their quality and reliability –To protect existing R&E infrastructures against very large data flows that look like ‘denial of service’ attacks LHCONE is expected to –Provide some guarantees of performance Large data flows across managed bandwidth that would provide better determinism than shared IP networks Segregation from competing traffic flows Manage capacity as # sites x Max flow/site x # Flows increases –Provide ways for better utilization of resources Use all available resources, especially transatlantic Provide Traffic Engineering and flow management capability –Leverage investments being made in advanced networking

LHCONE as of September Gbps Across the Atlantic NRENs in EU interconnected by GEANT LHCONE is an ‘Overlay’ network infrastructure B. Johnston/ESnet

LHCONE as of September Gbps Across the Atlantic NRENs in EU interconnected by GEANT LHCONE is an ‘Overlay’ network infrastructure B. Johnston/ESnet Primary resource for LHCONE today Is non-dedicated bandwidth

BNL LHCONE Traffic (as reported by ESnet/Arbor) BNL => LHCONE LHCONE => BNL 5 Gbps Oct 29 Nov 7

BNL LHCONE Traffic

LHCONE Operations Forum Meets bi-weekly on Mondays –Addressing higher level operational questions –Chaired by Dale Finkelson (Internet2) and Mike O’Connor (ESnet) Developing documentation (handbook) –What information is relevant? –Where the information is stored? –Who is responsible to keep the information up to date? –A reliable mechanism to broadcast information?

Evolving Application Requirements Application requirements extend beyond what’s typically provided by the “network” –Maybe a controversial statement, but … –… we’re now at the front edge of a paradigm shift driven by applications and enabled by emerging technologies allowing tighter coupling between big data, big computation and big networks –There is an opportunity that these requirements could be satisfied by a combination of intelligent network services, network embedded resources (processing & storage), in addition to our own processes and resources Chin Guok’s talk in this session 16

Point-to-Point Networking Activities Build on national and regional projects for the basic Dynamic Circuit technology –OSCARS (ESnet, RNP), ION (Internet2), DRAC(SURFNet), AutoBAHN (GEANT and some EU NRENs) Extending into campus –DYNES (Switch and Control Server Equipment) Interfacing with LHC experiments/sites –Next Generation Workload Management and Analysis System for Big Data; based on PanDA (PI: A. Klimentov) –DYNES; primarily intended to improve T2/T3 connectivity –ANSE; new NSF funded project aiming at integration of Advanced Network Services with Experiments’ data management/workflow SW (Implement Network Element in PanDA)

Software Defined Networking (SDN) – This looks like a promising Technology SDN Paradigm - Network control by applications; provides an API to externally define network functionality –Enabler for applications to fully exploit available network resources –OpenFlow, a SDN implementation, widely adopted by Industry (Network Equipment Manufacturers, Google, Cloud Computing) Network Services Interface (NSI) Framework Inder Monga’s talk at this week’s GDB (14:30) Jerry Sobieski’s talk at PtP W/S on Thursday

19 A possible path forward We are close to a 2-year shutdown of the LHC machine –As our understanding of our computing needs evolves we have an opportunity to find out how we can benefit from the middleware that supports flexible & nimble operations, including the network as a fully integrated facility component –Work in close collaboration with the Network Providers Define & implement Operational Interface between Users and Providers –E.g. the Working Group on LHCONE Operations has started activities Besides providing production services most of the providers participate in pilot and research activities –with the motivation of providing new types of services to the user communities –A good opportunity for the LHC program with its applications to explore possibilities and influence directions developments are taking

Emerging Technology Discussion - Opportunities An opportunity to integrate capabilities of big data, big processing and big networks Core networking technologies and architecture concepts –SDN – network is re-programmable in response to changing application requirements or network traffic conditions through standardized API. OpenFlow is primary focus in this area –Dedicated Transfer Facility/Nodes – Data Center placed at location for most efficient data transfer for performance and economic benefit –Fast Networks – 100 Gbps (and terabit/s in the future) as enabler for new services –Next Generation Network Features Emerging Technologies in Application and User domain space –Cloud Computing – emergence of rapidly provisionable virtual machines with standard APIs for flexible access to computation power; include network as another virtualizable resource –Super Computer Center Evolution – interplay between cloud models and these infrastructures will allow increased access in flexible manner –Embedded Network Storage – storage tightly coupled with network –Advanced Security – “last mile” issues cloud/users drives network performance, virtualization with strong isolation and rapid reconfiguration