OSG Networking: Summarizing a New Area in OSG Shawn McKee/University of Michigan Network Planning Meeting Esnet/Internet2/OSG August 23 rd, 2012.

Slides:



Advertisements
Similar presentations
IBM SMB Software Group ® ibm.com/software/smb Maintain Hardware Platform Health An IT Services Management Infrastructure Solution.
Advertisements

GENI Experiment Control Using Gush Jeannie Albrecht and Amin Vahdat Williams College and UC San Diego.
® IBM Software Group © 2010 IBM Corporation What’s New in Profiling & Code Coverage RAD V8 April 21, 2011 Kathy Chan
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
Maintaining and Updating Windows Server 2008
PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
1 © 2006 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Using the Cisco Technical Support & Documentation Website for Security.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Network Performance Measurement Atlas Tier 2 Meeting at BNL December Joe Metzger
1 ESnet Network Measurements ESCC Feb Joe Metzger
TeraGrid Information Services December 1, 2006 JP Navarro GIG Software Integration.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
DBS to DBSi 5.0 Environment Strategy Quinn March 22, 2011.
Lead Management Tool Partner User Guide March 15, 2013
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Connect communicate collaborate perfSONAR MDM updates: New interface, new possibilities Domenico Vicinanza perfSONAR MDM Product Manager
Network Monitoring for OSG Shawn McKee/University of Michigan OSG Staff Planning Retreat July 10 th, 2012 July 10 th, 2012.
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Confidential. For Channel Partners only. Do not distribute. C
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
Event Management & ITIL V3
Connect communicate collaborate perfSONAR MDM updates: New interface, new weathermap, towards a complete interoperability Domenico Vicinanza perfSONAR.
Internet2 Performance Update Jeff W. Boote Senior Network Software Engineer Internet2.
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik perfSONAR Operations Sub-group 22 nd October 2014.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Update on OSG/WLCG Network Services Shawn McKee, Marian Babik 2015 WLCG Collaboration Workshop 12 th April 2015.
1 Session Number Presentation_ID © 2002, Cisco Systems, Inc. All rights reserved. Using the Cisco TAC Website for Security and Virtual Private Network.
Microsoft Management Seminar Series SMS 2003 Change Management.
Update on WLCG/OSG perfSONAR Infrastructure Shawn McKee, Marian Babik HEPiX Fall 2015 Meeting at BNL 13 October 2015.
PerfSONAR-PS Functionality February 11 th 2010, APAN 29 – perfSONAR Workshop Jeff Boote, Assistant Director R&D.
WLCG perfSONAR-PS Update Shawn McKee/University of Michigan WLCG Network and Transfers Metrics Co-Chair Spring 2014 HEPiX LAPP, Annecy, France May 21 st,
Jeremy Nowell EPCC, University of Edinburgh A Standards Based Alarms Service for Monitoring Federated Networks.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Network and Transfer WG perfSONAR operations Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 28 h January 2015.
Connect communicate collaborate perfSONAR MDM for LHCOPN/LHCONE: partnership, collaboration, interoperability, openness Domenico Vicinanza perfSONAR MDM.
1 © 2006 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Using the Cisco Technical Support & Documentation Website for IP Routing.
PerfSONAR Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Cambridge, UK February 9 th, 2015.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Julia Andreeva on behalf of the MND section MND review.
US LHC Tier-2 Network Performance BCP Mar-3-08 LHC Community Network Performance Recommended BCP Eric Boyd Deputy Technology Officer Internet2.
Network Awareness and perfSONAR Why we want it. What are the challenges? Where are we going? Shawn McKee / University of Michigan OSG AHG - US CMS Tier-2.
DICE: Authorizing Dynamic Networks for VOs Jeff W. Boote Senior Network Software Engineer, Internet2 Cándido Rodríguez Montes RedIRIS TNC2009 Malaga, Spain.
LHCONE Monitoring Thoughts June 14 th, LHCOPN/LHCONE Meeting Jason Zurawski – Research Liaison.
David Foster, CERN GDB Meeting April 2008 GDB Meeting April 2008 LHCOPN Status and Plans A lot more detail at:
© 2014 Level 3 Communications, LLC. All Rights Reserved. Proprietary and Confidential. Simple, End-to-End Performance Management Application Performance.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
Deploying perfSONAR-PS for WLCG: An Overview Shawn McKee/University of Michigan WLCG PS Deployment TF Co-chair Fall 2013 HEPiX Ann Arbor, Michigan October.
Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016.
Maintaining and Updating Windows Server 2008 Lesson 8.
Distributed Network Monitoring in the Wisconsin Advanced Internet Lab Paul Barford Computer Science Department University of Wisconsin – Madison Spring,
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
1 Deploying Measurement Systems in ESnet Joint Techs, Feb Joseph Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
PerfSONAR operations meeting 3 rd October Agenda Propose changes to the current operations of perfSONAR Discuss current and future deployment model.
Federating Data in the ALICE Experiment
Shawn McKee, Marian Babik for the
perfSONAR-PS Deployment: Status/Plans
Robert Szuman – Poznań Supercomputing and Networking Center, Poland
Monitoring the US ATLAS Network Infrastructure with perfSONAR-PS
Shawn McKee/University of Michigan ATLAS Technical Interchange Meeting
Deployment & Advanced Regular Testing Strategies
LHCONE perfSONAR: Status and Plans
“Detective”: Integrating NDT and E2E piPEs
Presentation transcript:

OSG Networking: Summarizing a New Area in OSG Shawn McKee/University of Michigan Network Planning Meeting Esnet/Internet2/OSG August 23 rd, 2012

Outline  OSG Networking: A new area in OSG  Motivation for Network Monitoring  Status and Related Work  perfSONAR-PS  Modular Dashboard  Goals 8/23/2012ESnet/Internet2/OSG Network Planning2

OSG Networking: A New Area  As part of OSG’s next 5-year plan, a new area in “Networking” was added  Summary goal: To provide OSG networking support for OSG sites and users.  For the first year there are two primary components to focus on: the perfSONAR-PS toolkit and the Modular Dashboard  OSG sites should have an easy-to-install, easy-to-maintain toolkit  OSG should provide a “Modular Dashboard” (both a production instance and a software package) to collect, aggregate, summarize, analyze and visualize sets of OSG network metrics 8/23/2012ESnet/Internet2/OSG Network Planning3

Motivations for OSG Network Monitoring  Distributed collaborations rely upon the network as a critical part of their infrastructure, yet finding and debugging network problems can be difficult and, in some cases, take months.  There is typically no differentiation of how the network is used amongst the OSG users. (Quantity may vary)  We need a standardized way to monitor the network and locate problems quickly if they arise  We don’t want to have a network monitoring system per VO! 8/23/2012ESnet/Internet2/OSG Network Planning4

OSG perfSONAR-PS Deployment  We want a set of tools that:  Are easy to install  Measure the “network” behavior  Provide a baseline of network performance between end-sites  Are standardized and broadly deployed  Are “set-it and forget it” (continue to run without intervention)  Details of how LHCONE sites setup the perfSONAR-PS installations is documented on the Twiki at:  An example OSG could follow (with minor changes) 8/23/2012ESnet/Internet2/OSG Network Planning5

OSG Network Monitoring Goals  We want OSG sites to have the ability to easily monitor their network status  Sites should be able to determine if network problems are occurring  Sites should have a reasonable “baseline” measurement of usable bandwidth between themselves and selected peers  Sites should have standardized diagnostic tools available to identify, isolate and aid in the repair of network-related issues  We want OSG VOs to have the ability to easily monitor the set of network paths used by their sites  VOs should be able to identify problematic sites regarding their network  VOs should be able to track network performance and alert-on network problems between VO sites 8/23/2012ESnet/Internet2/OSG Network Planning6

How To Achieve These Goals?  OSG should plan to collaborate with the existing and ongoing efforts in ESnet/Internet2/LHC regarding network monitoring  The perfSONAR-PS toolkit is a actively developed set of network monitoring tools following the perfSONAR standards  There is an existing modular dashboard which is currently undergoing a redesign. OSG should not only use this but provide input about design features needed to enable its effective use for OSG  Some effort is underway to enable alerting for network problems. I had an undergraduate working on an example system (more later).  Details of how best to integrate within OSG planning and existing and future infrastructure are why we are here  This afternoon we can discuss possibilities… 8/23/2012ESnet/Internet2/OSG Network Planning7

perfSONAR-PS Deployment Considerations  Each “site” should have perfSONAR-PS instances in place.  If an OSG site has more than one “network” location, each should be instrumented and made part of scheduled testing.  Standardized hardware and software is a good idea  Measurements should represent what the network is doing and not differences in hardware/firmware/software.  USATLAS has identified and tested systems from Dell for perfSONAR-PS hardware. Two variants: R310 and R610.  R310 cheaper (<$900), can host 10G (Intel X520 NIC) but not supported by Dell (Most US ATLAS sites choose this)  R610 officially supports X520 NIC (Canadian sites choose this)  Orderable off the Dell LHC portal for LHC sites  VOs should try to upgrade perfSONAR-PS toolkit versions together 8/23/2012ESnet/Internet2/OSG Network Planning8

Modular Dashboard  While the perfSONAR-PS toolkit is very nice, it was designed to be a distributed, federated installation.  Not easy to get an “overview” of a set of sites or their status  USATLAS needed some “summary interface”  Thanks to Tom Wlodek’s work on developing a “modular dashboard” we have a very nice way to summarize the extensive information being collected for the near-term network characterization. (See talk later)  The dashboard provides a highly configurable interface to monitor a set of perfSONAR-PS instances via simple plug- in test modules. Users can be authorized based upon their grid credentials. Sites, clouds, services, tests, alarms and hosts can be quickly added and controlled. 8/23/2012ESnet/Internet2/OSG Network Planning9

VO Site Configuration Considerations  Determine what VO wants for scheduled tests  Recommendation for tests:  Latency tests (for the packet loss info). Use default settings  Throughput. How often and how long (USATLAS one per 4 hrs, 20 second duration; 10GE may need longer test)  Traceroute: Sites should setup a traceroute test to each other VO site  Use a “community” to self-identify VO sites of interest. I recommend the VO name. This will allow VO sites to pick that community and see everyone “advertising” that attribute. Allows adding sites to tests with a “click”  Get VO sites at the same (current) version  Make sure firewalls are not blocking either VO sites nor the collector at BNL (or OSG?): rnagios01.usatlas.bnl.gov  Copy/rewrite the LHCONE info on the Twiki for VO use 8/23/2012ESnet/Internet2/OSG Network Planning10

Targets for OSG  Two “clients” for OSG Network Monitoring: sites and VOs  How to support both most effectively?  Sites need:  Details of options for required hardware  Software (perfSONAR-PS) and detailed installation instructions  Configuration options documented with suggested best-practices  Notification when problems are identified  Set-it and forget-it operations…limited manpower and expertise  VOs need:  Site details (perfSONAR-PS instances at each VO site)  Software (modular dashboard host by OSG?) and detailed configuration options.  Dashboard configuration details: How to add my VO sites for monitoring?  Centralized test/scheduling management (“pull” model seems best) 8/23/2012ESnet/Internet2/OSG Network Planning11

Draft Work Plan for OSG  Develop OSG site install procedures for perfSONAR-PS  Use existing infrastructure for software download or provide OSG distribution (with hardening, appropriate config)?  Provide site recommendations and best practices guide  Provide VO-level recommendations and best practices doc  OSG should host a set of services providing a modular dashboard for VOs. Need to determine details  OSG should provide packaged “modular dashboard” components to allow sites/VOs to deploy their own instance.  OSG should allow VOs or sites to request “alerting” when monitoring identifies network problems. Need to create and deploy such a capability 8/23/2012ESnet/Internet2/OSG Network Planning12

Challenges Ahead  Getting hardware/software platform installed at OSG sites  Dashboard development: Currently USATLAS/BNL and ESnet and soon OSG, FNAL, Canada (ATLAS/HEPnet) and USCMS.  Managing site and test configurations  Determining the right level of scheduled tests for a site, e.g., which other OSG or VO sites?  Improving the management of the configurations for VOs/Clouds  Tools supporting central configuration (Aaron/Internet2 working on this)  Alerting: A high-priority need but complicated:  Alert who? Network issues could arise in any part of end-to-end path  Alert when? Defining criteria for alert threshold. Primitive services are easier. Network test results more complicated to decide  Integration with existing VO and OSG infrastructures. 8/23/2012ESnet/Internet2/OSG Network Planning13

Discussion/Questions 8/23/2012ESnet/Internet2/OSG Network Planning14 Questions or Comments?

References  perfSONAR-PS site  Install/configuration guide: ps/wiki/pSPerformanceToolkit32 ps/wiki/pSPerformanceToolkit32 ps/wiki/pSPerformanceToolkit32  Modular Dashboard: or  Tools, tips and maintenance:  LHCONE perfSONAR:  LHCOPN perfSONAR:  CHEP 2012 presentation on USATLAS perfSONAR-PS experience: 2&confId= &confId= &confId= /23/2012ESnet/Internet2/OSG Network Planning15

Yuan Cao’s Alerting Work  This summer I had a student from USTC(Hefei, China) work on a summer project with me. He chose to work on ‘perfSONAR-PS Alerting’ for his 8 week stay with us.  The project README is available at  He developed a simple Perl daemon system using a simplified APD (Adaptive Plateau Detection) which analyzes OWAMP data.  See  He added traceroute monitoring as well. See 8/23/2012ESnet/Internet2/OSG Network Planning16

Adaptive Plateau Detection Example  Example of adapative plateau detection  Identifies “significant” changes from a baseline 8/23/2012ESnet/Internet2/OSG Network Planning17

Alerting Schematic 8/23/2012ESnet/Internet2/OSG Network Planning18  Yuan’s alert system (grey)  Could be used to begin an “alerting” component in the dashboard

Example 1 Alerting Warning from APD Alert System: Data from your site might be missing or insufficient for analysis. Check your configuration file and see if there is a problem. This message was sent to No.10(WT2_SLAC) node in USATLAS. 8/23/2012ESnet/Internet2/OSG Network Planning19

Example 2 Alert (page 1) Warning from APD Alert System: Measurement shows that the one-way loss from your site (MWT2_UCHICAGO) to several other sites has changed significantly, but the delay hasn't changed noticeably. significantly, but the delay hasn't changed noticeably. This might be due to congestion or configurational problems at your site. Please check the problems to ensure the network works properly. The following Traceroute information might be useful for you.Source: uct2-net1.uchicago.edu ( ) Destination: psum01.aglt2.org ( ) Number of Tests: 6 Number of Paths: 8 Route 1: -> > > > > > > Route 2: -> > > > > > > Route 3: -> > > > > > > Route 4: -> > > > > > > /23/2012ESnet/Internet2/OSG Network Planning20

Example 2 Alerting (Page 2) Route 5: -> > > > > > > Route 6: -> > > > > > > Route 7: -> > > > > > > Route 8: -> > > > > > > Time: 8/20/ :08:56 Route 1 -> Route 2. Time: 8/20/ :19:32 Route 2 -> Route 3. Time: 8/20/ :30:18 Route 3 -> Route 4. Time: 8/20/ :40:54 Route 4 -> Route 6. Time: 8/20/ :51:30 Route 6 -> Route 8. This is a way to summarize routing changes and alert for the users. 8/23/2012ESnet/Internet2/OSG Network Planning21