ANSE: Advanced Network Services for Experiments Shawn McKee / Univ. of Michigan BigPANDA Meeting / UTA September 4, 2013.

Slides:



Advertisements
Similar presentations
Ethernet Switch Features Important to EtherNet/IP
Advertisements

IBM SMB Software Group ® ibm.com/software/smb Maintain Hardware Platform Health An IT Services Management Infrastructure Solution.
M A Wajid Tanveer Infrastructure M A Wajid Tanveer
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
1 Scalability is King. 2 Internet: Scalability Rules Scalability is : a critical factor in every decision Ease of deployment and interconnection The intelligence.
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:
Evolving Data Logistics Needs of the LHC Experiments Paul Sheldon Vanderbilt University.
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
SDN and Openflow.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
Chapter 9: Moving to Design
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.
IRNC Special Projects: IRIS and DyGIR Eric Boyd, Internet2 October 5, 2011.
National Science Foundation Arlington, Virginia January 7-8, 2013 Tom Lehman University of Maryland Mid-Atlantic Crossroads.
LARK Bringing Distributed High Throughput Computing to the Network Todd Tannenbaum U of Wisconsin-Madison Garhan Attebury
Protocol Architectures. Simple Protocol Architecture Not an actual architecture, but a model for how they work Similar to “pseudocode,” used for teaching.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )
Software-defined Networking Capabilities, Needs in GENI for VMLab ( Prasad Calyam; Sudharsan Rajagopalan;
ANSE: Advanced Network Services for the HEP Community Harvey B Newman California Institute of Technology LHCONE Workshop Paris, June 17, Networking.
GEC 15 Houston, Texas October 23, 2012 Tom Lehman Xi Yang University of Maryland Mid-Atlantic Crossroads (MAX)
Internet2 Performance Update Jeff W. Boote Senior Network Software Engineer Internet2.
Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.
A Framework for Internetworking Heterogeneous High-Performance Networks via GMPLS and Web Services Xi Yang, Tom Lehman Information Sciences Institute (ISI)
Connect communicate collaborate GÉANT3 Services Connectivity and Monitoring Services by and for NRENs Ann Harding, SWITCH TNC 2010.
1 Networking in the WLCG Facilities Michael Ernst Brookhaven National Laboratory.
TeraPaths TeraPaths: Establishing End-to-End QoS Paths through L2 and L3 WAN Connections Presented by Presented by Dimitrios Katramatos, BNL Dimitrios.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Packet switching network Data is divided into packets. Transfer of information as payload in data packets Packets undergo random delays & possible loss.
SDN AND OPENFLOW SPECIFICATION SPEAKER: HSUAN-LING WENG DATE: 2014/11/18.
ANSE: Advanced Network Services for [LHC] Experiments Artur Barczyk, California Institute of Technology for the ANSE team LHCONE Point-to-Point Service.
Summary - Part 2 - Objectives The purpose of this basic IP technology training is to explain video over IP network. This training describes how video can.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Network Schemata Martin Swany. Perspective UNIS – Uniform Network Information Schema –Unification of perfSONAR Lookup Service (LS) and Topology Service.
NORDUnet Nordic Infrastructure for Research & Education Workshop Introduction - Finding the Match Lars Fischer LHCONE Workshop CERN, December 2012.
SDN Management Layer DESIGN REQUIREMENTS AND FUTURE DIRECTION NO OF SLIDES : 26 1.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Dynamic Circuit Network An Introduction John Vollbrecht, Internet2 May 26, 2008.
Simple Infrastructure to Exploit 100G Wide Are Networks for Data-Intensive Science Shawn McKee / University of Michigan Supercomputing 2015 Austin, Texas.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
Network management Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance,
ANSE: Advanced Network Services for Experiments Institutes: –Caltech (PI: H. Newman, Co-PI: A. Barczyk) –University of Michigan (Co-PI: S. McKee) –Vanderbilt.
DICE: Authorizing Dynamic Networks for VOs Jeff W. Boote Senior Network Software Engineer, Internet2 Cándido Rodríguez Montes RedIRIS TNC2009 Malaga, Spain.
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.
Fabric: A Retrospective on Evolving SDN Presented by: Tarek Elgamal.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
Software Defined Networking and OpenFlow Geddings Barrineau Ryan Izard.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
ESnet’s Use of OpenFlow To Facilitate Science Data Mobility Chin Guok Inder Monga, and Eric Pouyoul OGF 36 OpenFlow Workshop Chicago, Il Oct 8, 2012.
LHCONE NETWORK SERVICES INTERFACE (NSI) POINT-TO-POINT TESTBED WITH ATLAS SITES Shawn McKee/Univ. of Michigan Kaushik De/Univ. of Texas Arlington (Thanks.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
Software Defined Networking BY RAVI NAMBOORI. Overview  Origins of SDN.  What is SDN ?  Original Definition of SDN.  What = Why We need SDN ?  Conclusion.
Instructor Materials Chapter 7: Network Evolution
SDN challenges Deployment challenges
Shawn McKee, Marian Babik for the
Multi-layer software defined networking in GÉANT
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Hyper-V Cloud Proof of Concept Kickoff Meeting <Customer Name>
SDN Overview for UCAR IT meeting 19-March-2014
Software Defined Networking (SDN)
Presentation transcript:

ANSE: Advanced Network Services for Experiments Shawn McKee / Univ. of Michigan BigPANDA Meeting / UTA September 4, 2013

ANSE at the BigPANDA Meeting?! Why am I here giving this talk? We have similar goals in our two projects; basically inserting networking into our software infrastructure. Since neither project has manpower to waste, we need to consider how best to apportion our activities to get the most overall impact. So first some details about ANSE… 9/4/2013 Shawn McKee2

ANSE Project Overview ANSE is a project funded by NSF’s CC-NIE program – Two years funding, started in January 2013, ~3 FTEs Collaboration of 4 institutes: – Caltech (CMS) – University of Michigan (ATLAS) – Vanderbilt University (CMS) – University of Texas at Arlington (ATLAS) Goal: Enable strategic workflow planning including network capacity as well as CPU and storage as a co-scheduled resource Path Forward: Integrate advanced network-aware tools with the mainstream production workflows of ATLAS and CMS Network provisioning and in-depth monitoring Complex workflows: a natural match and a challenge for SDN Exploit state of the art progress in high throughput long distance data transport, network monitoring and control 9/4/2013 Shawn McKee3

ANSE Objectives Deterministic, optimized workflows – Use network resource allocation along with storage and CPU resource allocation in planning data and job placement – Use accurate (as much as possible) information about the network to optimize workflows – Improve overall throughput and task times to completion Integrate advanced network-aware tools in the mainstream production workflows of ATLAS and CMS – Use tools and deployed installations where they exist – Extend functionality of the tools to match experiments’ needs – Identify and develop tools and interfaces where they are missing Build on several years of invested manpower, tools and ideas 9/4/2013 Shawn McKee4

ANSE Network Focus Areas Monitoring: Allows Reactive Use – React to “events” (State Changes) or Situations in the network – Throughput Measurements -> Possible Actions: 1.Raise Alarm and continue 2.Abort/restart transfers 3.Choose different source – Topology (+ Site & Path performance) Monitoring -> Possible actions: 1.Influence source selection 2.Raise alarm (e.g. extreme cases such as site isolation) Network Control: Allows Pro-active Use – Reserve bandwidth dynamically; prioritize transfers, remote access flows, etc. CPU, Storage Network – Co-scheduling of CPU, Storage and Network resources – Create Custom Topologies -> optimize infrastructure to match operational conditions: deadlines, work-profiles, e.g. during LHC running periods vs reconstruction/re-distribution 9/4/2013 Shawn McKee5

ANSE Methodology Use agile, managed bandwidth for tasks with levels of priority along with CPU and disk storage allocation. – Allows one to define goals for time-to-completion, with reasonable chance of success – Allows one to define metrics of success, such as the rate of work completion with reasonable resource use – Allows one to define and achieve “consistent” workflow Dynamic circuits a natural match (as in DYNES for Tier2s and Tier3s) Process-Oriented Approach – Measure resource usage and job/task progress in real-time – If resource use or rate of progress is not as requested/planned, diagnose, analyze and decide if and when task re-planning is needed Classes of work: defined by resources required, estimated time to complete, priority, etc. 9/4/2013 Shawn McKee6

Coupling ANSE to LHC Experiments ANSE ANSE is focused on delivering “Advanced Network Services” for the LHC experiments PANDA – In ATLAS we are targeting PANDA PhEDEx – In CMS we are targeting PhEDEx ATLAS/PANDACMS/PhEDEx – We have “core” ATLAS/PANDA and CMS/PhEDEx participants ANSE ANSE will create a library of network monitoring and services useable by both experiments software stacks PANDAPhEDEx – Need to ensure appropriate “hooks” are in place in PANDA and PhEDEx 9/4/2013 Shawn McKee7

Initial ANSE Work ANSE first steps are to gather available network information and filter/transform it as needed to make as reliable and useful as possible perfSONAR-PS – Getting network metrics from perfSONAR-PS SONARFAX – Using SONAR and FAX data from actual transfers – How best to filter/normalize and utilize this data? Network data exists but not yet complete, consistent or reliable. Working on improving things here. Network metrics will be used to inform decision- making when there are choices between sites and/or paths to make. 9/4/2013 Shawn McKee8

Impacts of Networking Monitoring Part of the ANSE project is to evaluate how much improvement is to be gained from utilizing knowledge of how networks paths have been performing. How “real-time” do we need networking information to be? – Does behavior over the last n-hours correlate with behavior over the next m-hours? PANDA has the advantage of knowing its incoming workload and how it is scheduling things – Can be used to adjust path metrics based upon plans Need to define the best metrics to evaluate impact 9/4/2013 Shawn McKee9

Beyond Monitoring The consensus is that good monitoring information from the network will help improve our ability to use our resources more effectively but what about negotiating with the network to further improve things? – Networks have moved beyond black boxes that transmit bits with varying delay and bandwidth. – Users now have the option to negotiate for the service(s) they require. Various networking services have been (and are being) developed to better optimize both network resource use and end-user experience: – SDNSDN – SDN: Software Defined Networks; OpenFlow – NSINSI – NSI: Network Service Interface DYNES/AutoBahn/ION/OSCARS – Dynamic circuits via DYNES/AutoBahn/ION/OSCARS, etc Part of our project is to make sure LHC experiments can utilize and benefit from these developments 9/4/2013 Shawn McKee10

NSI In a Slide “Network Service Interface” is a framework for inter-domain provisioning of connection-oriented services. – NSI Open Grid Forum (OGF) – NSI is an Open Grid Forum (OGF) standard – NSI – NSI is a consensus standard – open to wide participation and broad based agreement. – It is a work in progress. …and will continue to be evolve and be refined. As a framework, NSI is intended to provide a scalable, secure, and well integrated set of multi-domain architectural elements and service protocols that manage all aspects of services they present: Connection Reservation and Provisioning: NSI-CS (version2 – 2012) Topology Exchange (first edition expected in 2013) Others in the wings: – Automated Performance Verification – Automated Fault Processing Should this be a basis for the ANSE (and BigPANDA) API? 9/4/2013 Shawn McKee11

What is DYNES? A distributed virtual cyber-instrument spanning about 40 US universities and 11 Internet2 connectors which interoperates with ESnet, GEANT, APAN, US LHCNet, and many others. Synergetic projects include OliMPS and ANSE. 9/4/2013 Shawn McKee12 Dynamic network circuit provisioning and scheduling

DYNES and OpenFlow When DYNES started (2010) Software Defined Networking was “alpha” at best OpenFlow – We would have liked to start out with something like OpenFlow for use in DYNES OpenFlow OpenFlow In our last year we had the opportunity to integrate OpenFlow within DYNES by retrofitting some sites with Dell/Force10 S4810 switches and “beta” OpenFlow firmware. – Testing of OpenFlow and the Open Exchange Software Suite (OESS) integration was successful at Michigan (February 2013) – Orders for S4810 switches were placed and systems were distributed by July 2013 This should provide a simpler and more inter-operable DYNES instrument for the future. 9/4/2013 Shawn McKee13

DYNES Status DYNES The DYNES project (an NSF MRI 3 year grant) ended July 31… Last set of sites were included in July 2013 However we are still working on robust circuit creation (issues were discovered in the robustness as scalability of the central circuit services of the R&E backbones) DYNES We have developed a significant infrastructure to provision, deploy and monitor the DYNES instrument that lets us track status – – ANSEDYNES ANSE depends upon (and assumed) a working DYNES infrastructure. We are trying to make sure we have this in our toolbox. 9/4/2013 Shawn McKee14

perfSONAR perfSONAR Most of you have heard me talk about perfSONAR before (see my ATLAS TIM talk from Tokyo) so I won’t repeat those details ANSEBigPANDA For ANSE (and BigPANDA) we need reliable standard metrics from the network. perfSONAR- PS toolkit installations are an important source we plan to use – However perfSONAR-PS’s primary focus is on finding and localizing network issues We are close to having a full WLCG deployment 9/4/2013 Shawn McKee15

WLCG perfSONAR Google Map 9/4/2013 Shawn McKee16

Incorporating the Network for LHC ANSE In summary, ANSE goals are to: – Gather reliable means of estimating network performance between LHC sites. Which metrics? What timescales? – Provide interfaces for control of network characteristics where they exist. Requires “discovery”; AGIS info? – Better enable end-systems to optimize their ability to utilize the network connections available. Some technologies (circuits) need host optimizations – Future proof our infrastructure by providing generic hooks that can leverage “future” technologies as they are developed. Lots of work and we assume much of this is in- scope for BigPANDA…how to divide the efforts? 9/4/2013 Shawn McKee17

Questions or Comments? Time for some discussions? 9/4/2013 Shawn McKee18

Additional Slides 9/4/2013 Shawn McKee19

What is OpenFlow? OpenFlow is an open standard that enables researchers to run experimental protocols in the campus networks we use every day. OpenFlow is added as a feature to commercial Ethernet switches, routers and wireless access points – and provides a standardized hook to allow researchers to run experiments, without requiring vendors to expose the internal workings of their network devices. OpenFlow is currently being implemented by major vendors, with OpenFlow- enabled switches now commercially available – In a classical router or switch, the fast packet forwarding (data path) and the high level routing decisions (control path) occur on the same device. An OpenFlow Switch separates these two functions. The data path portion still resides on the switch, while high-level routing decisions are moved to a separate controller, typically a standard server. The OpenFlow Switch and Controller communicate via the OpenFlow protocol, which defines messages, such as packet-received, send- packet-out, modify-forwarding-table, and get-stats. – The data path of an OpenFlow Switch presents a clean flow table abstraction; each flow table entry contains a set of packet fields to match, and an action (such as send-out-port, modify-field, or drop). When an OpenFlow Switch receives a packet it has never seen before, for which it has no matching flow entries, it sends this packet to the controller. The controller then makes a decision on how to handle this packet. It can drop the packet, or it can add a flow entry directing the switch on how to forward similar packets in the future. OpenFlow abstracts the control plane, allowing it to exist centrally in software and giving the network manager programmatic control 9/4/2013 Shawn McKee20