Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.

Slides:



Advertisements
Similar presentations
Circuit Monitoring July 16 th 2011, OGF 32: NMC-WG Jason Zurawski, Internet2 Research Liaison.
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Update on OSG/WLCG perfSONAR infrastructure Shawn McKee, Marian Babik HEPIX Spring Workshop, Oxford 23 rd - 27 th March 2015.
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
User-Perceived Performance Measurement on the Internet Bill Tice Thomas Hildebrandt CS 6255 November 6, 2003.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Connect communicate collaborate perfSONAR MDM updates: New interface, new possibilities Domenico Vicinanza perfSONAR MDM Product Manager
Network Monitoring for OSG Shawn McKee/University of Michigan OSG Staff Planning Retreat July 10 th, 2012 July 10 th, 2012.
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
Connect communicate collaborate perfSONAR MDM updates: New interface, new weathermap, towards a complete interoperability Domenico Vicinanza perfSONAR.
Internet2 Performance Update Jeff W. Boote Senior Network Software Engineer Internet2.
1 Measuring Circuit Based Networks Joint Techs Feb Joe Metzger
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 8 th April 2015.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
ASCR/ESnet Network Requirements an Internet2 Perspective 2009 ASCR/ESnet Network Requirements Workshop April 15/16, 2009 Richard Carlson -- Internet2.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik perfSONAR Operations Sub-group 22 nd October 2014.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
Connect communicate collaborate Intercontinental Multi-Domain Monitoring for the LHC Community Domenico Vicinanza perfSONAR MDM Product Manager DANTE –
Update on OSG/WLCG Network Services Shawn McKee, Marian Babik 2015 WLCG Collaboration Workshop 12 th April 2015.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
Update on WLCG/OSG perfSONAR Infrastructure Shawn McKee, Marian Babik HEPiX Fall 2015 Meeting at BNL 13 October 2015.
PerfSONAR-PS Functionality February 11 th 2010, APAN 29 – perfSONAR Workshop Jeff Boote, Assistant Director R&D.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 18 h March 2015.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
WLCG perfSONAR-PS Update Shawn McKee/University of Michigan WLCG Network and Transfers Metrics Co-Chair Spring 2014 HEPiX LAPP, Annecy, France May 21 st,
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Network and Transfer WG perfSONAR operations Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 28 h January 2015.
PerfSONAR Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Cambridge, UK February 9 th, 2015.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
OSG Networking: Summarizing a New Area in OSG Shawn McKee/University of Michigan Network Planning Meeting Esnet/Internet2/OSG August 23 rd, 2012.
Julia Andreeva on behalf of the MND section MND review.
US LHC Tier-2 Network Performance BCP Mar-3-08 LHC Community Network Performance Recommended BCP Eric Boyd Deputy Technology Officer Internet2.
PerfSONAR-PS Working Group Aaron Brown/Jason Zurawski January 21, 2008 TIP 2008 – Honolulu, HI.
Network Awareness and perfSONAR Why we want it. What are the challenges? Where are we going? Shawn McKee / University of Michigan OSG AHG - US CMS Tier-2.
DICE: Authorizing Dynamic Networks for VOs Jeff W. Boote Senior Network Software Engineer, Internet2 Cándido Rodríguez Montes RedIRIS TNC2009 Malaga, Spain.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
GEMINI: Active Network Measurements Martin Swany, Indiana University.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
LHCONE Monitoring Thoughts June 14 th, LHCOPN/LHCONE Meeting Jason Zurawski – Research Liaison.
14-Nov-07 OWAMP (One-Way Latencies) BWCTL (Bandwidth Test Control) Jeff Boote Network Performance Tools BOF-SC07.
Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.
David Foster, CERN GDB Meeting April 2008 GDB Meeting April 2008 LHCOPN Status and Plans A lot more detail at:
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
Deploying perfSONAR-PS for WLCG: An Overview Shawn McKee/University of Michigan WLCG PS Deployment TF Co-chair Fall 2013 HEPiX Ann Arbor, Michigan October.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Activities and Perspectives at Armenian Grid site The 6th International Conference "Distributed Computing and Grid- technologies in Science and Education"
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Monitoring Overview: status, issues and outlook Simone Campana.
1 Deploying Measurement Systems in ESnet Joint Techs, Feb Joseph Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
PerfSONAR operations meeting 3 rd October Agenda Propose changes to the current operations of perfSONAR Discuss current and future deployment model.
Path Monitoring Tools Deployment Planning for U.S. T123
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
POW MND section.
Networking for the Future of Science
Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.
Monitoring the US ATLAS Network Infrastructure with perfSONAR-PS
Internet2 Performance Update
Shawn McKee/University of Michigan ATLAS Technical Interchange Meeting
Alerting/Notifications (MadAlert)
The Globus Toolkit™: Information Services
GÉANT network (December 2018)
Presentation transcript:

Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology

Network Monitoring for WLCG  WLCG relies heavily on the underlying networks  Interconnect sites and resources 16 October 2013 – CHEP

Network Monitoring for WLCG  We discovered that end-to-end network issues can be difficult to spot and debug  Insufficient tools to detect network failures and diagnose Sometimes noticed only by the application  Multiple “owners” (administrative domains)  The famous “BNL-CNAF network issue”   7 months, 72 entries in the ticket, lots of real work from many people 16 October 2013 – CHEP

Network Monitoring Infrastructure  The WLCG service needs to guarantee effective network usage and rapid solution of network issues  Using “standard” tools comes many benefits  Quality software, supported by a large community  Standard metrics, familiar for network engineers  WLCG choose perfSONAR as the basis of its network monitoring infrastructure  Significant experience already in USATLAS and LHCOPN 15 October 2013 – CHEP 2013, Amsterdam, NL 4

perfSONAR and perfSONAR-PS  perfSONAR is an infrastructure for network performance monitoring  Organized as consortium of organizations building an interoperable network monitoring middle-ware  Defines the service types and a protocol for them to communicate  Develops the software packages to implement the services  perfSONAR-PS is an open source development effort based on perfSONAR  targeted at creating an easy-to-deploy and easy-to-use set of perfSONAR services  Comes with all-in-one solution (CD or USB) or single packages for CentOS 5 and 6 15 October 2013 – CHEP 2013, Amsterdam, NL 5

perfSONAR-PS toolkit  Web based GUI  for the administrator to configure the service and schedule the tests  for the user to display the measurements  Engine for execution of various test types  Throughput tests (bwctl), non-concurrent  Ping (PingER), time stamped  One-Way Latency tests (owamp), time stamped  Traceroute  Network Diagnostic Tools (NDT,NPAD) on demand  A Measurement Archive stores and exposes programmatically the results 15 October 2013 – CHEP 2013, Amsterdam, NL 6

Early perfSONAR deployment  perfSONAR deployment started in the OPN and USATLAS  Test definitions statically configured on each node by the site administrator following a set of instructions  Good for the OPN use case  well established list of sites  Problematic for a broad deployment  Service endpoints might be changing  New sites might join  Difficult to coordinate the effort 15 October 2013 – CHEP 2013, Amsterdam, NL 7

WLCG deployment plan  WLCG choose to deploy perfSONAR-PS at all sites worldwide  A dedicated WLCG Operations Task-Force was started in Fall 2012  Sites are organized in regions  Based on geographical locations and experiments computing models  All sites are expected to deploy a bandwidth host and a latency host  Regular testing is setup using a centralized (“mesh”) configuration  Bandwidth tests: 30 seconds tests every 6 hours intra-region, 12 hours for T2-T1 inter-region, 1 week elsewhere  Latency tests; 10 Hz of packets to each WLCG site  Traceroute tests between all WLCG sites each hour  Ping(ER) tests between all site every 20 minutes 15 October 2013 – CHEP 2013, Amsterdam, NL 8

perfSONAR-PS Mesh Example 15 October 2013 – CHEP 2013, Amsterdam, NL 9 The perfSONAR-PS instances can participate in more than one configuration (WLCG, Tier-1 cloud, VO-based, etc.) The WLCG mesh configurations are centrally hosted at CERN and exposed through HTTP perfSONAR-PS toolkit instances can get their configuration information from a URL hosting an suitable JSON file An agent_configuration file on the PS node defines one or more URLs

The perfSONAR Modular Dashboard  Centrally aggregates measurements from all PS hosts  Provides a web UI and REST interface   A new implementation maturing production quality  Addressing scalability issues for large meshes  Providing a more extensive REST API  Self-configuring from mesh definitions  Fancier …  SNAPSHOT/matrices.jsp?id= SNAPSHOT/matrices.jsp?id=62  Discussions with OSG about hosting the Modular Dashboard service and automating mesh-config creation 15 October 2013 – CHEP 2013, Amsterdam, NL 10 bwctl last 30 days 10Mb/ s 50Mb/ s

Example of Network Monitoring 15 October 2013 – CHEP 2013, Amsterdam, NL 11 ATLAS aggregates complementary network information in the Site Status Board Topology FTS transfer (per file) perfSONAR WAN data access (WN-SE)

perfSONAR Deployment Status 15 October 2013 – CHEP 2013, Amsterdam, NL 12

Conclusions  Network Monitoring is a key component for WLCG Operation  We are deploying a monitoring infrastructure based on perfSONAR-PS in the scope of WLCG Operations  70% of the infrastructure has been deployed  Completing the deployment and optimizing the tests are the next steps 15 October 2013 – CHEP 2013, Amsterdam, NL 13