ATLAS Network Requirements – an Open Discussion ATLAS Distributed Computing Technical Interchange Meeting University of Tokyo.

Slides:

Advertisements

Similar presentations

Duke University SDN Approaches and Uses GENI CIO Workshop – July 12, 2012.

Advertisements

Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.

Internetworking II: MPLS, Security, and Traffic Engineering

FNAL Site Perspective on LHCOPN & LHCONE Future Directions Phil DeMar (FNAL) February 10, 2014.

Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

A Flexible Model for Resource Management in Virtual Private Networks Presenter: Huang, Rigao Kang, Yuefang.

1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.

Trial of the Infinera PXM Guy Roberts, Mian Usman.

December 20, 2004MPLS: TE and Restoration1 MPLS: Traffic Engineering and Restoration Routing Zartash Afzal Uzmi Computer Science and Engineering Lahore.

Lesson 11-Virtual Private Networks. Overview Define Virtual Private Networks (VPNs). Deploy User VPNs. Deploy Site VPNs. Understand standard VPN techniques.

Network Traffic Measurement and Modeling CSCI 780, Fall 2005.

Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.

Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.

Semester 4 - Chapter 3 – WAN Design Routers within WANs are connection points of a network. Routers determine the most appropriate route or path through.

Firewalls and VPNS Team 9 Keith Elliot David Snyder Matthew While.

Second year review Resource Pooling Damon Wischik, UCL.

Data Communications and Networks Chapter 2 - Network Technologies - Circuit and Packet Switching Data Communications and Network.

CLIENT A client is an application or system that accesses a service made available by a server. applicationserver.

1 The SpaceWire Internet Tunnel and the Advantages It Provides For Spacecraft Integration Stuart Mills, Steve Parkes Space Technology Centre University.

1 October 20-24, 2014 Georgian Technical University PhD Zaza Tsiramua Head of computer network management center of GTU South-Caucasus Grid.

A Virtual Circuit Multicast Transport Protocol (VCMTP) for Scientific Data Distribution Jie Li and Malathi Veeraraghavan University of Virginia Steve Emmerson.

© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Identifying Application Impacts on Network Design Designing and Supporting Computer.

Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,

1 High-Level Carrier Requirements for Cross Layer Optimization Dave McDysan Verizon.

Copyright © 2008 OMAC. All rights reserved Packaging Automation Survey Summary Dave Bauman OMAC Technical Director.

Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)

Lecture Week 3 Frame Relay Accessing the WAN. 3.1 Basic Frame Relay Concepts Accessing the WAN.

© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Identifying Application Impacts on Network Design Designing and Supporting.

1 Second ATLAS-South Caucasus Software / Computing Workshop & Tutorial October 24, 2012 Georgian Technical University PhD Zaza Tsiramua Head of computer.

Chapter 4 Realtime Widely Distributed Instrumention System.

Sami Al-wakeel 1 Data Transmission and Computer Networks The Switching Networks.

Wolfgang EffelsbergUniversity of Mannheim1 Differentiated Services for the Internet Wolfgang Effelsberg University of Mannheim September 2001.

Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.

LHC Open Network Environment LHCONE David Foster CERN IT LCG OB 30th September

S4-Chapter 3 WAN Design Requirements. WAN Technologies Leased Line –PPP networks –Hub and Spoke Topologies –Backup for other links ISDN –Cost-effective.

TeraPaths TeraPaths: Establishing End-to-End QoS Paths through L2 and L3 WAN Connections Presented by Presented by Dimitrios Katramatos, BNL Dimitrios.

ALMA Archive Operations Impact on the ARC Facilities.

Netprog: Routing and the Network Layer1 Routing and the Network Layer (ref: Interconnections by Perlman)

Online-Offsite Connectivity Experiments Catalin Meirosu *, Richard Hughes-Jones ** * CERN and Politehnica University of Bucuresti ** University of Manchester.

LHC OPEN NETWORK ENVIRONMENT STATUS UPDATE Artur Barczyk/Caltech Tokyo, May 2013 May 14, 2013

From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.

NORDUnet Nordic Infrastructure for Research & Education Workshop Introduction - Finding the Match Lars Fischer LHCONE Workshop CERN, December 2012.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 4: Planning and Configuring Routing and Switching.

Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.

Best Available Technologies: External Storage Overview of Opportunities and Impacts November 18, 2015.

TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.

David Foster, CERN GDB Meeting April 2008 GDB Meeting April 2008 LHCOPN Status and Plans A lot more detail at:

From the Transatlantic Networking Workshop to the DAM Jamboree David Foster CERN-IT.

A Strawman for Merging LHCOPN and LHCONE infrastructure LHCOPN + LHCONE Meeting Washington, DC, Jan. 31, 2013 W. E. Johnston and Chin Guok.

© 2014 Level 3 Communications, LLC. All Rights Reserved. Proprietary and Confidential. Simple, End-to-End Performance Management Application Performance.

100GE Upgrades at FNAL Phil DeMar; Andrey Bobyshev CHEP 2015 April 14, 2015.

Lab A: Planning an Installation

Use Case for Distributed Data Center in SUPA

Grid Optical Burst Switched Networks

Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017

Semester 4 - Chapter 3 – WAN Design

Packet Switching Datagram Approach Virtual Circuit Approach

Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.

Network Requirements Javier Orellana

Ákos Frohner EGEE'08 September 2008

Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.

Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.

70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 4: Planning and Configuring Routing and Switching.

Data Communication Networks

Switching Techniques.

Routing and the Network Layer (ref: Interconnections by Perlman

Microsoft Virtual Academy

Presentation transcript:

ATLAS Network Requirements – an Open Discussion ATLAS Distributed Computing Technical Interchange Meeting University of Tokyo

ATLAS Networking Needs Is this something to worry about? –Maybe not, millions of Netflix users generate much more traffic than HEP users do If it works for them it must work for us too! –Maybe yes, because Netflix users don’t compare well with us Commercial Internet providers optimize their infrastructure for mainstream clients and not for the specific needs of the HEP community, e.g. –Traffic pattern characterized by small flows (tiny “transactions”) –A few lost packets at 10 Gbps cause 80-fold throughput drop –Connectivity issues between NRENs and “Commercials” –Availability/Reliability issues

Bandwidth and Throughput We often confuse Bandwidth with Throughput –Bandwidth is what providers have in their infrastructure –Throughput is what we observe with our applications  The two are (very) unlikely the same  Depends on how applications use the network  Shared infrastructure  Throughput very dependent on server configuration

Thoughts on LHCOPN (1/3) The LHCOPN is the current private, physical circuit infrastructure that serves data transfers from the Tier 0 to the Tier 1s and between the Tier 1s, and an example where circuits – physical or virtual – are needed. The reasons for this are the original requirements for: –guaranteed delivery of data over a long period of time since the source at CERN is essentially a continuous, real-time data source; –long-term use of substantial fractions of the link capacity; –a well-understood and deterministic backup / fail-over mechanism; –guaranteed capacity that does not impact other uses of the network; –a clear cost model with long-term capacity associated with specific sites (the Tire 1s); –a mechanism that is easily integrated into a production operations environment that monitors the circuit health and has established trouble-shooting, resolution responsibility, and provides for problem tracking and reporting

Thoughts on LHCOPN (2/3) However, things (may) have changed a bit meanwhile Is service availability still the main driver? The original specification was 99.95% on average over a full year, which is hard to achieve for TA links Instead quality requirements on uptimes could be expressed in terms of capabilities. This could be formulated e.g. like: "minimum n PB per day for at least 4 days in any week, with no more than 1 week deviation from this per quarter, and never more than 4 consecutive days of no connectivity, and never less than x PB per 2 week interval" or something like that (I'm making this up, don't take the values literally, but I think they are indicative). After all, we're perfectly capable for a) re-routing traffic to other networks, i.e. LHCONE, either directly or if need be by e.g. "hopping" file transfers (I would hope there are more elegant ways of doing it), and b) we are able to catch up on n days of downtime and have done so when storage systems, firewalls, software releases etc gave us grief in the past.

Thoughts on LHCOPN (3/3) Possibilities, as recently discussed in the LHCONE WG, include moving the LHCOPN to a virtual circuit (VC) service: –VCs can be moved around on an underlying physical infrastructure to better use available capacity, and potentially, to provide greater robustness in the face of physical circuit outages; –VCs have the potential to allow for sharing of a physical link when the VC is idle or used less than the committed bandwidth. Our requirements (?) for and benefits from circuits include –Topological flexibility –Circuit implementation that allows sharing the underlying physical link That is, b/w committed to, but not used by the circuit, are available for other traffic, i.e. Tier-1  Tier-2 via LHCONE

LHCOPN in a Circuit Scenario Useful semantics in a shared Infrastructure –Although the virtual circuits are rate-limited at the ingress to limit utilization to that requested by users they are permitted to burst above the allocated bandwidth if idle capacity is available Must be done without interfering with other circuits, or other uses of the link, such as general IP traffic, by, for example, marking the over-allocation bandwidth as low- priority traffic –User can request a second circuit that is diversely routed from the first circuit. In order to provide high reliability for backup circuit ……… Why is this interesting? –The rise of a general infrastructure that is 100G / link, using dedicated 10G links for Tier 0 – Tier 1 becomes increasingly inefficient –Shifting the OPN circuits to virtual circuits on the general (or LHCONE) infrastructure could facilitate sharing while meeting the minimum required guaranteed OPN bandwidth

B. Johnston et al

Cost Models, Managing Allocations The reserved bandwidth of a circuit is a scarce commodity – This commodity must be manageable What sorts of manageability do we/the Users require –What do we need to control in terms of circuit creation?

FAX/Remote IO and the Network Remote I/O is a scenario that puts the WAN between the data and the executing analysis code –Today processing model based on data affinity data is staged to the site where the compute resources are located and data access by analysis code is from local, site-resident, storage –Inserting the WAN is a change that potentially requires special measures to ensure the smooth flow of data between disk and computing system, and therefore the “smooth” job execution needed to make effective use of the compute resources.

Programmatic Replication and Remote IO Mix of remote I/O and bulk data transfer –What will be the mix of remote IO-based access and bulk data transfer? –The change to remote I/O is, to a certain extent, aimed at lessening the use of the bulk data transfers that use GridFTP, so is addressing GridFTP addressing a dying horse? –How much bulk transfer will there be in a mature remote IO scenario, and between what mix of sites?

A possible path forward Build a full mesh of static circuits whose bandwidth can be increased/decreased based on application/workflow needs: R&E networks & end-sites affected based on decisions above to share their possible NSI v2.0 deployment plans. How does ScienceDMZ/DYNES/CC-NIE installations play into this picture? Bandwidth used is a portion of the bandwidth used for VRF infrastructure today as it has capacity available. Routing infrastructure for this circuit infrastructure to be discussed. A couple of alternatives include migrating portion of VRF's over the circuits on the participating sites, with options to shift the routes over to the current VRF infrastructure if something happens to circuit infrastructure due to experimentation. The right level of API/abstraction discussion between application guys and network folks to design the right interface into the circuit infrastructure. Try and address concerns that circuits are complex to deploy and debug. Continue joint application guys and networking experts meeting at CERN. Interest in optimization with information from the network and co- scheduling resources Decide the right metrics and experiment to quantify if and how circuits help applications since there is divided opinion within the group. The experiment needs to be designed properly.